Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Your thoughts and opinions are appreciated.
COBOL - the elephant that can stand on its trunk...
Depends on the content of the table. Only one type, say, binary?
Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in
working-storage.
But, since you are using ODO, what do you need to initialize?
Depends on the content of the table. Only one type, say, binary?
Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in working-storage.
But, since you are using ODO, what do you need to initialize?
The table organization are a combination of binary comp-3
and alphanumeric. The table is populated based on ODO and
the initialization technique is to initialize as needed
only. Initializing the first occurrence in the table then
when putting something in the first position the algorithm
will initialize the next. Therefore, initializing the exact
number of occurrences only.
Perhaps the COBOL compiler you are using already knows the best way to initialize an array/table? You could, for example, say:
MOVE LOW-VALUES TO WS-TABLE
I would also be interested to know whether or not you have tried
different methods of initializing the table and timed the different
attempts? What is the difference between the try illustrated above
versus (say) "INITIALIZE WS-TABLE" or other methods like:
PERFORM VARYING IND FROM BY 1 UNTIL IND > WS-BYTES-IN-WS-TABLE
MOVE SPACE TO WS-TABLE (IND:1)
END-PERFORM
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Your thoughts and opinions are appreciated.
COBOL - the elephant that can stand on its trunk...I may be a bit of an extinct Mammoth elephant but have been doing Cobol for 40 years.
Probably the most efficient way is to set up an independent
record with all values initialized. Move that record to the
table as needed.
I may be a bit of an extinct Mammoth elephant but have been doing Cobol for 40 years.Table-wise this table is small for an ISAM file, That's why I
A 1 MByte memory table these days is small - but doing a binary search or sort - that is really extinct.
I am not really answering for an in memory table and may advise against it. Your question: What is the most optimized method to initialize a mammoth table - that needs to be initialized constantly.
I will only address a suggestion that you write a temproary sort-work using an ISAM file. It is extremly quick and efficient. I gave up using COBOL Sort in the 1980's and moved to using temproary ISAM files and let the file system handle sorting.
I introduced a standard report structure that is two pass. The first pass is to build an ISAM file with a primary key that is say 32 bytes and the program varies the key according to user selection criteria. The 2nd pass processes the sort-work file. This works well today and is quick.
If you need more on what primary key to write, what secondary key to write then I can expand.
You wrote - initialized constantly - and that needs more explanation.
Greg
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Probably the most efficient way is to set up an independent
record with all values initialized. Move that record to the
table as needed.
I just tested this code as an independent record to initialize the table:
for every needed occurrence: move ws-repository to ws-table-items.
it should work just fine since an alphanumeric move is done one byte at
a time from left to right, and stops when the end of the shortest field
is encountered. I think the compiler should issue a warning message though >about moving a field to a part of itself just as a notice information.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-repository.
05 format-table.
10 format-alphanumeric pic x(8) value spaces.
10 format-numeric pic s9(9)v99 comp-3 value +0.
05 ws-table-items.
10 filler occurs 1 to 53000 times depending on ws-table-counter.
15 table-plan pic x(8).
15 table-member pic s9(9)v99 comp-3.
Depends on the content of the table. Only one type, say, binary?
Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in working-storage.
But, since you are using ODO, what do you need to initialize?
The table organization are a combination of binary comp-3
and alphanumeric. The table is populated based on ODO and
the initialization technique is to initialize as needed
only. Initializing the first occurrence in the table then
when putting something in the first position the algorithm
will initialize the next. Therefore, initializing the exact
number of occurrences only.
On Thursday, May 31, 2018 at 9:47:11 AM UTC+10, Kellie Fitton wrote:
Depends on the content of the table. Only one type, say, binary?
Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in working-storage.
But, since you are using ODO, what do you need to initialize?
The table organization are a combination of binary comp-3
and alphanumeric. The table is populated based on ODO and
the initialization technique is to initialize as needed
only. Initializing the first occurrence in the table then
when putting something in the first position the algorithm
will initialize the next. Therefore, initializing the exact
number of occurrences only.
You say that you use a binary search. Wouldn't that need all
elements of the table to be initialised first?
On Friday, June 1, 2018 at 1:18:26 AM UTC-4, r.....@gmail.com wrote:
On Thursday, May 31, 2018 at 9:47:11 AM UTC+10, Kellie Fitton wrote:
Depends on the content of the table. Only one type, say, binary?
Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in working-storage.
But, since you are using ODO, what do you need to initialize?
The table organization are a combination of binary comp-3
and alphanumeric. The table is populated based on ODO and
the initialization technique is to initialize as needed
only. Initializing the first occurrence in the table then
when putting something in the first position the algorithm
will initialize the next. Therefore, initializing the exact
number of occurrences only.
You say that you use a binary search. Wouldn't that need all
elements of the table to be initialised first?
No. The number of entries in the table is variable.
Given that 0 <= N <= 53000, only N values will participate
in the binary search. Those M where N < M <= 53000 need not
be initialized.
On Saturday, June 2, 2018 at 12:44:49 AM UTC+10, Rick Smith wrote:
On Friday, June 1, 2018 at 1:18:26 AM UTC-4, r.....@gmail.com wrote:
On Thursday, May 31, 2018 at 9:47:11 AM UTC+10, Kellie Fitton wrote:
Depends on the content of the table. Only one type, say, binary?
Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in working-storage.
But, since you are using ODO, what do you need to initialize?
The table organization are a combination of binary comp-3
and alphanumeric. The table is populated based on ODO and
the initialization technique is to initialize as needed
only. Initializing the first occurrence in the table then
when putting something in the first position the algorithm
will initialize the next. Therefore, initializing the exact
number of occurrences only.
You say that you use a binary search. Wouldn't that need all
elements of the table to be initialised first?
No. The number of entries in the table is variable.
Given that 0 <= N <= 53000, only N values will participate
in the binary search. Those M where N < M <= 53000 need not
be initialized.
Let's hear it from the OP.
She says it's a "mammoth table".
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Your thoughts and opinions are appreciated.
On Friday, June 1, 2018 at 11:35:23 AM UTC-4, robin....@gmail.com wrote:
On Saturday, June 2, 2018 at 12:44:49 AM UTC+10, Rick Smith wrote:
On Friday, June 1, 2018 at 1:18:26 AM UTC-4, r.....@gmail.com wrote:
On Thursday, May 31, 2018 at 9:47:11 AM UTC+10, Kellie Fitton wrote:
Depends on the content of the table. Only one type, say, binary? Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in working-storage.
But, since you are using ODO, what do you need to initialize?
The table organization are a combination of binary comp-3
and alphanumeric. The table is populated based on ODO and
the initialization technique is to initialize as needed
only. Initializing the first occurrence in the table then
when putting something in the first position the algorithm
will initialize the next. Therefore, initializing the exact
number of occurrences only.
You say that you use a binary search. Wouldn't that need all
elements of the table to be initialised first?
No. The number of entries in the table is variable.
Given that 0 <= N <= 53000, only N values will participate
in the binary search. Those M where N < M <= 53000 need not
be initialized.
Let's hear it from the OP.
She says it's a "mammoth table".
Let's not. The 0 and 53000 were given by the OP. The rest is
derivable from the COBOL standard.
Have you considered using a hash table rather than using a binary search ?The table is 1 MByte sized and will be searched often so
Make the table larger, say double, and calculate a hash from the key. For example take the alpha and redefine as a binary numeric, divide by the table size and use the remainder as the 'bucket number' index to store the entry.
Then the lookup (in idealized conditions) will be a single calculation and lookup rather than a series of divides and comparisons.
Of course it is unlikely to be idealized and so an overflow mechanism will be required for when several items calculate the same 'bucket number'. This can be done by adding an 'overflow chain' field to each item. Several different strategies could be used. For example: on overflow try to put the item in the next empty bucket, or at some offset, or in a reserved overflow area.
Packing density needs to be quite low to avoid as much overflow as possible. It is usual to analyze the actual data with several algorithms in order to choose a reasonable one.
There's no indication that her test table is the same
as the one actually used.
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Your thoughts and opinions are appreciated.
COBOL - the elephant that can stand on its trunk...Hi Kellie
Anyway, I can give code samples if you email me directly to gregwebace at gmail.com for more. It seems Google Groups does not allow a direct email address.
So far you talk about a binary search but do not reveal what the search key is. You say it needs to be refreshed from some data master data.
I can add more re:
I am very intrigued by your two pass report structure.
Greg
Have you considered using a hash table rather than using a binary search ?
Make the table larger, say double, and calculate a hash from the key. For example take the alpha and redefine as a binary numeric, divide by the table size and use the remainder as the 'bucket number' index to store the entry.
Then the lookup (in idealized conditions) will be a single calculation and lookup rather than a series of divides and comparisons.
Of course it is unlikely to be idealized and so an overflow mechanism will be required for when several items calculate the same 'bucket number'. This can be done by adding an 'overflow chain' field to each item. Several different strategies could be used. For example: on overflow try to put the item in the next empty bucket, or at some offset, or in a reserved overflow area.
Packing density needs to be quite low to avoid as much overflow as possible. It is usual to analyze the actual data with several algorithms in order to choose a reasonable one.
The table is 1 MByte sized and will be searched often soYou seem to miss the point that a hash can be used for an array in memory as well as for relative file. A hash can be _much_more_ efficient than a binary search given an adequate algorithm and sufficiently small packing density to avoid too much overflow.
a binary search would be more efficient and simpler. Hash
tables are used for relative files, my system is using ISAM
files. I always use ISAM files as lookup search tables when
the table size is rather huge for a binary search all.
On Thursday, 31 May 2018 02:21:06 UTC+10, Kellie Fitton wrote:
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Your thoughts and opinions are appreciated.
COBOL - the elephant that can stand on its trunk...
I may be a bit of an extinct Mammoth elephant but have been doing Cobol for 40 years.
A 1 MByte memory table these days is small - but doing a binary search or sort - that is really extinct.
I am not really answering for an in memory table and may advise against it. Your question: What is the most optimized method to initialize a mammoth table - that needs to be initialized constantly.
I will only address a suggestion that you write a temproary sort-work using an ISAM file. It is extremly quick and efficient. I gave up using COBOL Sort in the 1980's and moved to using temproary ISAM files and let the file system handle sorting.
I introduced a standard report structure that is two pass. The first pass is to build an ISAM file with a primary key that is say 32 bytes and the program varies the key according to user selection criteria. The 2nd pass processes the sort-work file. This works well today and is quick.
If you need more on what primary key to write, what secondary key to write then I can expand.
You wrote - initialized constantly - and that needs more explanation.
Greg
On 31/05/2018 3:15 PM, Greg Wallace wrote:
On Thursday, 31 May 2018 02:21:06 UTC+10, Kellie Fitton wrote:
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
This might not be the "right" question.
Maybe you need to think about whether you need a table at all, rather
than how it should be initialized...? See below.
Your thoughts and opinions are appreciated.
COBOL - the elephant that can stand on its trunk...
I may be a bit of an extinct Mammoth elephant but have been doing Cobol for 40 years.
A 1 MByte memory table these days is small - but doing a binary search or sort - that is really extinct.
I am not really answering for an in memory table and may advise against it. Your question: What is the most optimized method to initialize a mammoth table - that needs to be initialized constantly.
I will only address a suggestion that you write a temproary sort-work using an ISAM file. It is extremly quick and efficient. I gave up using COBOL Sort in the 1980's and moved to using temproary ISAM files and let the file system handle sorting.
I introduced a standard report structure that is two pass. The first pass is to build an ISAM file with a primary key that is say 32 bytes and the program varies the key according to user selection criteria. The 2nd pass processes the sort-work file. This works well today and is quick.
If you need more on what primary key to write, what secondary key to write then I can expand.
You wrote - initialized constantly - and that needs more explanation.
Greg
I just wanted to note in passing that I was betting someone wouldI think Greg's suggestion to use an ISAM file instead of a table
suggest using an ISAM file.
It's a very good solution.
(Like Greg, I too have been writing COBOL for 40+ years, so maybe it's
an "Olde Tyme Solution"... :-))
You can discuss all kinds of clever ways to optimize a binary search
(no-one so far has suggested an unbalanced or skewed chop...), You can
look at clever hashing algorithms and re-invent in memory the file
system with buckets and overflow that was implemented by ICL in the
1960s, you can use refmodding to split the table as you insert each
entry in sequence (having first initialized to high-values), but they
all obfuscate what the real requirement is:
You need to build and organize a list into a specific key sequence (and
it is a "big" list...)
Kellie put it in memory because "everybody knows" "Memory must be faster".
(Generally, of course, it is... but if you spend a great deal of time messing around with your memory-based entries and moving great hunks of
your table around, it certainly won't be as fast as you might hope.)
Given the same requirements, (and given I can't use LINQ) I would opt
for the same solution that Greg has suggested.
Here's why:
1. I HATE, LOATHE, and DETEST OCCURS DEPENDING and simply won't use it.
It is a pointless bloody waste of time that lulls you into thinking you
are using memory in an optimized way but goes ahead and allocates
maximum space anyway. You save nothing with it.
(OK, as Rick pointed out, in this case it effectively "limits" the scope
of the binary chop, but that is not compelling enough for me to change
my mind about it... :-))
2. The problem of initializing the table for different data types is
removed if you simply load it sequentially from an ISAM file.
At the same time, you can obtain a count of the entries actually loaded,
so you know what the limit is and don't NEED OCCURS
DEPENDING...(Hooray!) (You will need to write your own binary chop to
search it, but that's pretty trivial. If you REALLY want to use SEARCH
ALL then you need to use OCCURS DEPENDING.)
3. There is no need for SORT (either external or internal); ISAM sorts
it as it is created.
4. I don't like re-inventing the wheel; everything you need has already
been written by the people who wrote ISAM...
So...
1. Set up an ISAM file for "temporary" use that has the required key and element (record) structure you need. (Each record on the file will be an element in the table.) Define this file for sequential access and give
it a "fairly large" block size. (Most of the data manipulation will then
be in memory, but you don't have to worry about it.)
2. As you receive the elements, write them to the ISAM file.
3. When you need to use the table, perform a routine that reads the ISAM file and writes sequentially to the table. (Loads the table from the
file with one sequential pass.)
At this point you should stop and ask yourself why you need the table at all. Why not just get records randomly from the ISAM file?
The answer will depend on how you use the table. Are you sharing it
between several modules, for example? Once it is built does it not
change? (Until the next "set" of data causes it to be re-loaded...)
Is there a great deal of access to it (where physical IO could
accumulate to slow things down...)?
Kellie originally asked for opinions.
Given the constraints imposed by using COBOL (no LINQ available), mine
is pretty close to Greg's...
Pete.
On Saturday, June 2, 2018 at 6:20:07 PM UTC-7, pete dashwood wrote:
On 31/05/2018 3:15 PM, Greg Wallace wrote:
On Thursday, 31 May 2018 02:21:06 UTC+10, Kellie Fitton wrote:
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
This might not be the "right" question.
Maybe you need to think about whether you need a table at all, rather
than how it should be initialized...? See below.
Your thoughts and opinions are appreciated.
COBOL - the elephant that can stand on its trunk...
I may be a bit of an extinct Mammoth elephant but have been doing Cobol for 40 years.
A 1 MByte memory table these days is small - but doing a binary search or sort - that is really extinct.
I am not really answering for an in memory table and may advise against it. >>> Your question: What is the most optimized method to initialize a mammoth table - that needs to be initialized constantly.
I will only address a suggestion that you write a temproary sort-work using an ISAM file. It is extremly quick and efficient. I gave up using COBOL Sort in the 1980's and moved to using temproary ISAM files and let the file system handle sorting.
I introduced a standard report structure that is two pass. The first pass is to build an ISAM file with a primary key that is say 32 bytes and the program varies the key according to user selection criteria. The 2nd pass processes the sort-work file. This works well today and is quick.
If you need more on what primary key to write, what secondary key to write then I can expand.
You wrote - initialized constantly - and that needs more explanation.
Greg
I just wanted to note in passing that I was betting someone would
suggest using an ISAM file.
It's a very good solution.
(Like Greg, I too have been writing COBOL for 40+ years, so maybe it's
an "Olde Tyme Solution"... :-))
You can discuss all kinds of clever ways to optimize a binary search
(no-one so far has suggested an unbalanced or skewed chop...), You can
look at clever hashing algorithms and re-invent in memory the file
system with buckets and overflow that was implemented by ICL in the
1960s, you can use refmodding to split the table as you insert each
entry in sequence (having first initialized to high-values), but they
all obfuscate what the real requirement is:
You need to build and organize a list into a specific key sequence (and
it is a "big" list...)
Kellie put it in memory because "everybody knows" "Memory must be faster". >>
(Generally, of course, it is... but if you spend a great deal of time
messing around with your memory-based entries and moving great hunks of
your table around, it certainly won't be as fast as you might hope.)
Given the same requirements, (and given I can't use LINQ) I would opt
for the same solution that Greg has suggested.
Here's why:
1. I HATE, LOATHE, and DETEST OCCURS DEPENDING and simply won't use it.
It is a pointless bloody waste of time that lulls you into thinking you
are using memory in an optimized way but goes ahead and allocates
maximum space anyway. You save nothing with it.
(OK, as Rick pointed out, in this case it effectively "limits" the scope
of the binary chop, but that is not compelling enough for me to change
my mind about it... :-))
2. The problem of initializing the table for different data types is
removed if you simply load it sequentially from an ISAM file.
At the same time, you can obtain a count of the entries actually loaded,
so you know what the limit is and don't NEED OCCURS
DEPENDING...(Hooray!) (You will need to write your own binary chop to
search it, but that's pretty trivial. If you REALLY want to use SEARCH
ALL then you need to use OCCURS DEPENDING.)
3. There is no need for SORT (either external or internal); ISAM sorts
it as it is created.
4. I don't like re-inventing the wheel; everything you need has already
been written by the people who wrote ISAM...
So...
1. Set up an ISAM file for "temporary" use that has the required key and
element (record) structure you need. (Each record on the file will be an
element in the table.) Define this file for sequential access and give
it a "fairly large" block size. (Most of the data manipulation will then
be in memory, but you don't have to worry about it.)
2. As you receive the elements, write them to the ISAM file.
3. When you need to use the table, perform a routine that reads the ISAM
file and writes sequentially to the table. (Loads the table from the
file with one sequential pass.)
At this point you should stop and ask yourself why you need the table at
all. Why not just get records randomly from the ISAM file?
The answer will depend on how you use the table. Are you sharing it
between several modules, for example? Once it is built does it not
change? (Until the next "set" of data causes it to be re-loaded...)
Is there a great deal of access to it (where physical IO could
accumulate to slow things down...)?
Kellie originally asked for opinions.
Given the constraints imposed by using COBOL (no LINQ available), mine
is pretty close to Greg's...
Pete.
I think Greg's suggestion to use an ISAM file instead of a table
is a far superior method since this will eliminate the need to
initialize the table, and will shorten the runtime instruction
path since COBOL programs are I/O bound rather than CPU bound.
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Your thoughts and opinions are appreciated.
COBOL - the elephant that can stand on its trunk...I think if Pete agrees than the ISAM idea carries more weight. I just add that you must have KEY. If you were doing a binary search you must be searching for some value. This should be the primary to what I call the sort-work file. You just close and open it for output and it is initialized.
On Thursday, 31 May 2018 02:21:06 UTC+10, Kellie Fitton wrote:
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Your thoughts and opinions are appreciated.
COBOL - the elephant that can stand on its trunk...
I think if Pete agrees than the ISAM idea carries more weight.
I tend to Open for Output ten close and then Open again for I-O. There was a good reason for this that escapes me. Even elephants/mammoths don't have perfect memory.
Next is the file name. If you have multiple simultaneous users you may want a unique file name for each user session and there are several ways to do this.
If your KEY is not unique then you can generate a sequence number as the key and have a secondary key for searches (Start, Read-Next).
Another tip I employ is to always have a flag to indicate whether a file is open. I tend to use myfilename-open which is Y or N. If the file is open successfully set the flag to Y. When it closes set the file to N. This way you can open and close the file in many places.
Also most I-O to this file will be in Cache memory which can a bit slower but you are semi-employing an in memory table without reinventing the wheel re binary searches.
I hope this is sufficiently clear.
I'd use an 88 level with a value of 1 or 0... :-)
01Â filler pic x value space.
   12 myfilenameOPEN   value '1'.
   12 myfilenameCLOSED value '0'.
In terms of execution efficiency of the ISAM solution, it comes down
largely to how much of the file you can buffer in memory, but if you ran
a benchmark I think you would be agreeably surprised by the speed of it.
The actual processing logic is certainly much simpler than manipulating
and initializing your table, if the table is truly "large".
On Sunday, June 3, 2018 at 7:28:19 PM UTC+12, pete dashwood wrote:I don't want to write a new program so I used one of my two-pass report programs. It was reading 200,000 records and writing to a Sort file and took about 6 seconds to when the sort file is produced. When I used standard user selection options to reduce it to 50,000 output records in the Sort file it still took about 6 seconds. So the reading of the entire file was the main delay.
In terms of execution efficiency of the ISAM solution, it comes down largely to how much of the file you can buffer in memory, but if you ran
a benchmark I think you would be agreeably surprised by the speed of it. The actual processing logic is certainly much simpler than manipulating and initializing your table, if the table is truly "large".
ISAM lookups may be 10 times _slower_ than a table SEARCH ALL.
I just did a benchmark on a slow system. 5,000,000 SEARCH ALLs on a 50,000 sized table is < 2 seconds. The same number of ISAM reads on the same data takes 20 sec. Load of the table from the ISAM file is insignificant.
YMMV
Depends on the content of the table. Only one type, say, binary?
Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in working-storage.
But, since you are using ODO, what do you need to initialize?
The table organization are a combination of binary comp-3
and alphanumeric. The table is populated based on ODO and
the initialization technique is to initialize as needed
only. Initializing the first occurrence in the table then
when putting something in the first position the algorithm
will initialize the next. Therefore, initializing the exact
number of occurrences only.
Probably the most efficient way is to set up an independent
record with all values initialized. Move that record to the
table as needed.
I just tested this code as an independent record to initialize the table:
for every needed occurrence: move ws-repository to ws-table-items.
it should work just fine since an alphanumeric move is done one byte at
a time from left to right, and stops when the end of the shortest field
is encountered. I think the compiler should issue a warning message though about moving a field to a part of itself just as a notice information.
01 ws-table-counter pic 9(5) comp-5 value 0."""for every needed occurrence: move ws-repository to ws-table-items. it should work just fine since an alphanumeric move is done one byte at
01 ws-repository.
05 format-table.
10 format-alphanumeric pic x(8) value spaces.
10 format-numeric pic s9(9)v99 comp-3 value +0.
05 ws-table-items.
10 filler occurs 1 to 53000 times depending on ws-table-counter.
15 table-plan pic x(8).
15 table-member pic s9(9)v99 comp-3.
I may be a bit of an extinct Mammoth elephant but have been doing Cobol for 40 years.
A 1 MByte memory table these days is small - but doing a binary search or sort - that is really extinct.
I am not really answering for an in memory table and may advise against it.
Your question: What is the most optimized method to initialize a mammoth table - that needs to be initialized constantly.
I will only address a suggestion that you write a temproary sort-work using an ISAM file. It is extremly quick and efficient. I gave up using COBOL Sort in the 1980's and moved to using temproary ISAM files and let the file system handle sorting.
I introduced a standard report structure that is two pass. The first pass is to build an ISAM file with a primary key that is say 32 bytes and the program varies the key according to user selection criteria. The 2nd pass processes the sort-work file. This works well today and is quick.
If you need more on what primary key to write, what secondary key to write then I can expand.
You wrote - initialized constantly - and that needs more explanation.
Greg
Table-wise this table is small for an ISAM file, That's why IThe only 'reset' you need is MOVE ZERO TO ws-table-count.
elected to use an in-memory table since the search all still
very fast for a 1 MByte table.
The table needs to get refreshed/reset periodically so it can
accommodate anew set of fresh data collected from a master file.
Hence, the initialization algorithm must reset the old data and
prepare the table for the re-populate process.
I am very intrigued by your two pass report structure. I hope--- Synchronet 3.20a-Linux NewsLink 1.114
you have the time to elaborate on the process of varying the
keys according to the users selection criteria. Thanks...
On Thursday, 31 May 2018 02:21:06 UTC+10, Kellie Fitton wrote:
Hi Folks,
One of my programs is handling a mammoth table that needs to be
initialized constantly. It is a million-byte table and used for
lookup records (binary search all) to increase the speed of the
program. The clause occurs depending on is used to create the
table accordingly. Moreover, to ensure reduced CPU consumption,
the initialization algorithm is using reference modifications to
obviate initializing the whole table more often that required.
I need your kind help with the following question:
What is the most optimized method to initialize a mammoth table?
Your thoughts and opinions are appreciated.
COBOL - the elephant that can stand on its trunk...
I think if Pete agrees than the ISAM idea carries more weight. I just add that you must have KEY. If you were doing a binary search you must be searching for some value. This should be the primary to what I call the sort-work file. You just close and open it for output and it is initialized.
I tend to Open for Output ten close and then Open again for I-O. There was a good reason for this that escapes me. Even elephants/mammoths don't have perfect memory.
Next is the file name. If you have multiple simultaneous users you may want a unique file name for each user session and there are several ways to do this.--- Synchronet 3.20a-Linux NewsLink 1.114
If your KEY is not unique then you can generate a sequence number as the key and have a secondary key for searches (Start, Read-Next).
Another tip I employ is to always have a flag to indicate whether a file is open. I tend to use myfilename-open which is Y or N. If the file is open successfully set the flag to Y. When it closes set the file to N. This way you can open and close the file in many places. E.G. to refresh the file test whether myfilename-open = Y, then close it, then reopen it. This is pseudo code for convenience rather than actual correct COBOL syntax. It is also a very good way to make sure all files are closed on exit if you have and should have one exit point.
Also most I-O to this file will be in Cache memory which can a bit slower but you are semi-employing an in memory table without reinventing the wheel re binary searches.
I hope this is sufficiently clear.
Greg
On Thursday, May 31, 2018 at 3:20:52 PM UTC+12, Kellie Fitton wrote:
Probably the most efficient way is to set up an independent
record with all values initialized. Move that record to the
table as needed.
I just tested this code as an independent record to initialize the table: for every needed occurrence: move ws-repository to ws-table-items.
it should work just fine since an alphanumeric move is done one byte at
a time from left to right, and stops when the end of the shortest field
is encountered. I think the compiler should issue a warning message though about moving a field to a part of itself just as a notice information.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-repository.
05 format-table.
10 format-alphanumeric pic x(8) value spaces.
10 format-numeric pic s9(9)v99 comp-3 value +0.
05 ws-table-items.
10 filler occurs 1 to 53000 times depending on ws-table-counter.
15 table-plan pic x(8).
15 table-member pic s9(9)v99 comp-3.
"""for every needed occurrence: move ws-repository to ws-table-items. it should work just fine since an alphanumeric move is done one byte atThe sending variable format-alphanumeric is size pic x(8)
a time from left to right, and stops when the end of the shortest field
is encountered."""
No. An alphanumeric move will pad out the end of the receiving field with spaces. Think of MOVE "A" TO WS-Name which is PIC X(40). Do you expect an "A" followed by whatever was in the remainder of that field before the move ? You should expect an "A" followed by 39 spaces.
On Thursday, May 31, 2018 at 11:47:11 AM UTC+12, Kellie Fitton wrote:
Depends on the content of the table. Only one type, say, binary?
Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in working-storage.
But, since you are using ODO, what do you need to initialize?
The table organization are a combination of binary comp-3
and alphanumeric. The table is populated based on ODO and
the initialization technique is to initialize as needed
only. Initializing the first occurrence in the table then
when putting something in the first position the algorithm
will initialize the next. Therefore, initializing the exact
number of occurrences only.
No. Your description tells me that you are initializing one more than the 'exact number of occurrences only'. When you have put something into position 1 the 'exact number of occurrences' is 1, but you are then initializing position 2.
Do you need an extra empty initialized occurrence at the end of the array which is then included in the ODO ? So that after 'putting something in the first position' the 'occurrences' is 2 ?
What is done in 'initialize the next position (2)' that cannot be done in 'put something into position 2'.
Micro Focus had an option that allowed OPEN I-O to create the file if it did not already exist. Other systems would give a '35' file status and fail. However OPEN OUTPUT would delete an existing file and recreate a new empty one which may not be useful unless you have just detected a '35' file status.If the file does not exist--the OPEN I-O will create the file Only
I don't want to write a new program so I used one of my two-pass report programs. It was reading 200,000 records and writing to a Sort file and took about 6 seconds to when the sort file is produced. When I used standard user selection options to reduce it to 50,000 output records in the Sort file it still took about 6 seconds. So the reading of the entire file was the main delay.Greg,
Greg
On Sunday, June 3, 2018 at 3:33:14 PM UTC-7, Richard wrote:
On Thursday, May 31, 2018 at 3:20:52 PM UTC+12, Kellie Fitton wrote:
Probably the most efficient way is to set up an independent
record with all values initialized. Move that record to the
table as needed.
I just tested this code as an independent record to initialize the table: for every needed occurrence: move ws-repository to ws-table-items.
it should work just fine since an alphanumeric move is done one byte at
a time from left to right, and stops when the end of the shortest field is encountered. I think the compiler should issue a warning message though
about moving a field to a part of itself just as a notice information.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-repository.
05 format-table.
10 format-alphanumeric pic x(8) value spaces.
10 format-numeric pic s9(9)v99 comp-3 value +0.
05 ws-table-items.
10 filler occurs 1 to 53000 times depending on ws-table-counter.
15 table-plan pic x(8).
15 table-member pic s9(9)v99 comp-3.
"""move ws-repository to ws-table-items""""""for every needed occurrence: move ws-repository to ws-table-items. it should work just fine since an alphanumeric move is done one byte at
a time from left to right, and stops when the end of the shortest field
is encountered."""
No. An alphanumeric move will pad out the end of the receiving field with spaces. Think of MOVE "A" TO WS-Name which is PIC X(40). Do you expect an "A" followed by whatever was in the remainder of that field before the move ? You should expect an "A" followed by 39 spaces.
The sending variable format-alphanumeric is size pic x(8)
The receiving variable table-plan is size pic x(8)
both variables are same size, same length - NO remainder - NO Pad Out..."""and stops when the end of the shortest field is encountered."""
On Sunday, June 3, 2018 at 3:14:10 PM UTC-7, Richard wrote:
On Thursday, May 31, 2018 at 11:47:11 AM UTC+12, Kellie Fitton wrote:
Depends on the content of the table. Only one type, say, binary?
Or, mixed types, binary and alphanumeric?
It would be helpful to know the organization as defined in working-storage.
But, since you are using ODO, what do you need to initialize?
The table organization are a combination of binary comp-3
and alphanumeric. The table is populated based on ODO and
the initialization technique is to initialize as needed
only. Initializing the first occurrence in the table then
when putting something in the first position the algorithm
will initialize the next. Therefore, initializing the exact
number of occurrences only.
No. Your description tells me that you are initializing one more than the 'exact number of occurrences only'. When you have put something into position 1 the 'exact number of occurrences' is 1, but you are then initializing position 2.
Do you need an extra empty initialized occurrence at the end of the array which is then included in the ODO ? So that after 'putting something in the first position' the 'occurrences' is 2 ?
What is done in 'initialize the next position (2)' that cannot be done in 'put something into position 2'.
The initialization logic is: [format-table-items-as-needed-only].The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
It will initialize the first occurrence in the table, then after
putting something in the first-position-of-the-table (1), the
next position will be initialized when the table have the second
occurrence and ready to get populated with data for position (2)
I think Greg's suggestion to use an ISAM file instead of a table
is a far superior method since this will eliminate the need to
initialize the table, and will shorten the runtime instruction
path since COBOL programs are I/O bound rather than CPU bound.
Have you considered using a hash table rather than using a binary search ?
On Sunday, June 3, 2018 at 2:47:08 PM UTC-7, Greg Wallace wrote:
I don't want to write a new program so I used one of my two-pass report programs. It was reading 200,000 records and writing to a Sort file and took about 6 seconds to when the sort file is produced. When I used standard user selection options to reduce it to 50,000 output records in the Sort file it still took about 6 seconds. So the reading of the entire file was the main delay.
Greg
Greg,This elephant was using MicroFocus (MF) COBOL on the first IBM PC that only had two floppy disks. MF COBOL was one of only 20 Apps certified for the release of the first IBM PC. I was not happy with MF ISAM for many reasons and in about 1990 switched to AcuCobol. There ISAM method is called Vision and I have found it very reliable to this day. MF no doubt improved theirs subsequently.
Do you use Micro Focus COBOL compiler? if so:
Did you set the environment variable IDXDATBUF to increase the buffer size? the default value is 0, increasing its value will improve file access speed. the variable must be set in increments of 4096
SET IDXDATBUF=8192
The table is 1 MByte sized and will be searched often so
a binary search would be more efficient and simpler.
I tend to Open for Output ten close and then Open again for I-O. There
was a good reason for this that escapes me. Even elephants/mammoths
don't have perfect memory.
On 3/06/2018 7:43 PM, Greg Wallace wrote:
I think if Pete agrees than the ISAM idea carries more weight.
It was a nice thing to say, Greg, but it really isn't true; ideas stand
on their own merit, not on who espouses them or doesn't... :-)
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?Richard,
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,"""The table needs to be initialized (formatted) prior to being populated"""
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
On Wednesday, June 6, 2018 at 3:26:14 AM UTC+12, Kellie Fitton wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The
ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
"""The table needs to be initialized (formatted) prior to being populated"""
NO IT DOES NOT. You seem to be incredibly resistant to advice.
'Initialization' does NOT 'format' the data area. The 'format' of the table items is set by picture clauses during the compile.
In fact, low-values may not be valid in table-member, depending on implementation, because the final nibble will be the code for the sign value. This actually doesn't matter because you will be moving a valid number to it before it is used anyway.
If you are populating the table sequentially, and thus moving data into both fields for every occurrence up to ws-table-counter, then the 'initialization' is just a waste of time. The resulting table will be identical without it.
"""The initialization is done As-Needed-Only for each table-row."""
That is NOT what your code is doing. Your code is moving low-values from byte 1 of the whole table up to the current limit of the table. If you are executing this code for each data item then you are overwriting the data already in the table each time.
Your code seems to change each time you post, and you seem to get it wrong each time. Now the code has ws-table-items with the occurs (which may be an improvement) rather than a subsiduary filler field with the occurs.
At the very least you can now speed up the 'initialization' by simply doing:
add 1 to ws-counter
move low-values to ws-table-items(ws-counter)
which will avoid overwriting all the current data items already loaded and will be faster than reference notation. But it is still a complete waste.
Another improvement is to 'move low-values to ws-repository' before loading any data (as was suggested by Kerry). This is likely to be much faster than doing it item by item because of the overhead of using a subscript and of doing thousands of moves rather than just one.
I suggest that you post code that has been compiled and TESTED rather than just making up more stuff on the fly and getting it wrong.
You should be testing the speed of each of these methods and also seeing that the results match what you expect. Why aren't you doing that?
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,Just to show that I am capable of running tests and timing them, which is what you should be doing, I have done 'initializing' a table 3 ways, the results are:
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,If both table-plan and table-member are filled in then the
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The >ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
On Sunday, June 3, 2018 at 7:28:19 PM UTC+12, pete dashwood wrote:
In terms of execution efficiency of the ISAM solution, it comes down
largely to how much of the file you can buffer in memory, but if you ran
a benchmark I think you would be agreeably surprised by the speed of it.
The actual processing logic is certainly much simpler than manipulating
and initializing your table, if the table is truly "large".
ISAM lookups may be 10 times _slower_ than a table SEARCH ALL.
I just did a benchmark on a slow system. 5,000,000 SEARCH ALLs on a 50,000 sized table is < 2 seconds. The same number of ISAM reads on the same data takes 20 sec. Load of the table from the ISAM file is insignificant.
YMMV
On Wednesday, June 6, 2018 at 3:26:14 AM UTC+12, Kellie Fitton wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
"""The table needs to be initialized (formatted) prior to being populated"""
NO IT DOES NOT. You seem to be incredibly resistant to advice.
'Initialization' does NOT 'format' the data area. The 'format' of the table items is set by picture clauses during the compile.
In fact, low-values may not be valid in table-member, depending on implementation, because the final nibble will be the code for the sign value. This actually doesn't matter because you will be moving a valid number to it before it is used anyway.
If you are populating the table sequentially, and thus moving data into both fields for every occurrence up to ws-table-counter, then the 'initialization' is just a waste of time. The resulting table will be identical without it.
"""The initialization is done As-Needed-Only for each table-row."""
That is NOT what your code is doing. Your code is moving low-values from byte 1 of the whole table up to the current limit of the table. If you are executing this code for each data item then you are overwriting the data already in the table each time.
Your code seems to change each time you post, and you seem to get it wrong each time. Now the code has ws-table-items with the occurs (which may be an improvement) rather than a subsiduary filler field with the occurs.
At the very least you can now speed up the 'initialization' by simply doing:
add 1 to ws-counter
move low-values to ws-table-items(ws-counter)
which will avoid overwriting all the current data items already loaded and will be faster than reference notation. But it is still a complete waste.
Another improvement is to 'move low-values to ws-repository' before loading any data (as was suggested by Kerry). This is likely to be much faster than doing it item by item because of the overhead of using a subscript and of doing thousands of moves rather than just one.
I suggest that you post code that has been compiled and TESTED rather than just making up more stuff on the fly and getting it wrong.
You should be testing the speed of each of these methods and also seeing that the results match what you expect. Why aren't you doing that?
On Wednesday, June 6, 2018 at 3:26:14 AM UTC+12, Kellie Fitton wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
The ws-table-position variable should be 9 digits, I was typing fast toRichard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
Just to show that I am capable of running tests and timing them, which is what you should be doing, I have done 'initializing' a table 3 ways, the results are:
Each time is for 10,000 repeats and for 50,000 entries
1. move low-values to ws-repository : 0.12 seconds
2. move low-values to ws-table-items(index) : 7.2 seconds
3. move low-values to ws-repository(calculated:length of entry) : 59 seconds
So, not only is your code wrong but it is the worst way of doing the initialization by a factor of 48000%.
Your original code was at least 6000% slower than the best.
And it doesn't need to be done anyway.
Also, as yet another failure, ws-table-position is only 5 digits (pic 9(5)) and this will overflow when you multiply 14 * 53000, or 14 * any number more than 7142.
On Tue, 5 Jun 2018 08:26:12 -0700 (PDT), Kellie Fitton <KELLIEFITTON@yahoo.com> wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Clark,Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The >ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =If both table-plan and table-member are filled in then the
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
initialization is a waste of computer cycles since the filling of
table-plan will overwrite the low-values in position 1 of table-plan.
Clark Morris
On 6/5/2018 4:25 PM, Richard wrote:
On Wednesday, June 6, 2018 at 3:26:14 AM UTC+12, Kellie Fitton wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The
ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
"""The table needs to be initialized (formatted) prior to being populated"""
NO IT DOES NOT. You seem to be incredibly resistant to advice.
'Initialization' does NOT 'format' the data area. The 'format' of the table items is set by picture clauses during the compile.
In fact, low-values may not be valid in table-member, depending on implementation, because the final nibble will be the code for the sign value. This actually doesn't matter because you will be moving a valid number to it before it is used anyway.
If you are populating the table sequentially, and thus moving data into both fields for every occurrence up to ws-table-counter, then the 'initialization' is just a waste of time. The resulting table will be identical without it.
"""The initialization is done As-Needed-Only for each table-row."""
That is NOT what your code is doing. Your code is moving low-values from byte 1 of the whole table up to the current limit of the table. If you are executing this code for each data item then you are overwriting the data already in the table each time.
Your code seems to change each time you post, and you seem to get it wrong each time. Now the code has ws-table-items with the occurs (which may be an improvement) rather than a subsiduary filler field with the occurs.
At the very least you can now speed up the 'initialization' by simply doing:
add 1 to ws-counter
move low-values to ws-table-items(ws-counter)
which will avoid overwriting all the current data items already loaded and will be faster than reference notation. But it is still a complete waste.
Another improvement is to 'move low-values to ws-repository' before loading any data (as was suggested by Kerry). This is likely to be much faster than doing it item by item because of the overhead of using a subscript and of doing thousands of moves rather than just one.
Roger the above.
I suggest that you post code that has been compiled and TESTED rather than just making up more stuff on the fly and getting it wrong.
You should be testing the speed of each of these methods and also seeing that the results match what you expect. Why aren't you doing that?
+1 to this.Richard,
On 3/06/2018 9:53 PM, Richard wrote:
On Sunday, June 3, 2018 at 7:28:19 PM UTC+12, pete dashwood wrote:
In terms of execution efficiency of the ISAM solution, it comes down
largely to how much of the file you can buffer in memory, but if you ran >> a benchmark I think you would be agreeably surprised by the speed of it. >> The actual processing logic is certainly much simpler than manipulating
and initializing your table, if the table is truly "large".
ISAM lookups may be 10 times _slower_ than a table SEARCH ALL.
I just did a benchmark on a slow system. 5,000,000 SEARCH ALLs on a 50,000 sized table is < 2 seconds. The same number of ISAM reads on the same data takes 20 sec. Load of the table from the ISAM file is insignificant.
YMMV
Thanks for that, Richard. It appears that random ISAM access may be
worse than I would have expected...
That reinforces the case for loading the table and then using the table
for random retrieval (where this makes sense to do, of course.)
I noted (and completely agree with) comments by you and Clark under
Kellie's post.
There seems to be some fundamental mis-understanding about
"initializing" then overwriting.
Hopefully, the posts have helped to clear it up.
Pete.
--
I used to write COBOL; now I can do anything...
On Tuesday, June 5, 2018 at 7:24:30 PM UTC-7, pete dashwood wrote:
On 3/06/2018 9:53 PM, Richard wrote:
On Sunday, June 3, 2018 at 7:28:19 PM UTC+12, pete dashwood wrote:
In terms of execution efficiency of the ISAM solution, it comes down
largely to how much of the file you can buffer in memory, but if you ran >> a benchmark I think you would be agreeably surprised by the speed of it. >> The actual processing logic is certainly much simpler than manipulating >> and initializing your table, if the table is truly "large".
ISAM lookups may be 10 times _slower_ than a table SEARCH ALL.
I just did a benchmark on a slow system. 5,000,000 SEARCH ALLs on a 50,000 sized table is < 2 seconds. The same number of ISAM reads on the same data takes 20 sec. Load of the table from the ISAM file is insignificant.
YMMV
Thanks for that, Richard. It appears that random ISAM access may be
worse than I would have expected...
That reinforces the case for loading the table and then using the table for random retrieval (where this makes sense to do, of course.)
I noted (and completely agree with) comments by you and Clark under Kellie's post.
There seems to be some fundamental mis-understanding about
"initializing" then overwriting.
Hopefully, the posts have helped to clear it up.
Pete.
--
I used to write COBOL; now I can do anything...
Pete,
As mentioned in my question: the table needs to get refreshed
and reset Constantly. The initialize process must REMOVE the
old data from the table, then the NEW SET OF FRESH DATA will
re-populate the table periodically. Hence, re-initialize...
On Tuesday, June 5, 2018 at 3:11:51 PM UTC-7, Kerry Liles wrote:
On 6/5/2018 4:25 PM, Richard wrote:
On Wednesday, June 6, 2018 at 3:26:14 AM UTC+12, Kellie Fitton wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The
ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
"""The table needs to be initialized (formatted) prior to being populated"""
NO IT DOES NOT. You seem to be incredibly resistant to advice.
'Initialization' does NOT 'format' the data area. The 'format' of the table items is set by picture clauses during the compile.
In fact, low-values may not be valid in table-member, depending on implementation, because the final nibble will be the code for the sign value. This actually doesn't matter because you will be moving a valid number to it before it is used anyway.
If you are populating the table sequentially, and thus moving data into both fields for every occurrence up to ws-table-counter, then the 'initialization' is just a waste of time. The resulting table will be identical without it.
"""The initialization is done As-Needed-Only for each table-row."""
That is NOT what your code is doing. Your code is moving low-values from byte 1 of the whole table up to the current limit of the table. If you are executing this code for each data item then you are overwriting the data already in the table each time.
Your code seems to change each time you post, and you seem to get it wrong each time. Now the code has ws-table-items with the occurs (which may be an improvement) rather than a subsiduary filler field with the occurs.
At the very least you can now speed up the 'initialization' by simply doing:
add 1 to ws-counter
move low-values to ws-table-items(ws-counter)
which will avoid overwriting all the current data items already loaded and will be faster than reference notation. But it is still a complete waste.
Another improvement is to 'move low-values to ws-repository' before loading any data (as was suggested by Kerry). This is likely to be much faster than doing it item by item because of the overhead of using a subscript and of doing thousands of moves rather than just one.
Roger the above.
I suggest that you post code that has been compiled and TESTED rather than just making up more stuff on the fly and getting it wrong.
You should be testing the speed of each of these methods and also seeing that the results match what you expect. Why aren't you doing that?
+1 to this.
Richard,Then I would posit that your description of what your code does is simply not true.
I already tested and compared several sets of initialization
methods before posting my question. The Initialize process was
conducted with the following methods:
initialize ws-repository
move low-values to ws-repository
perform varying loop
move spaces to table-plan
move zeros to table-member
end-perform
calculate ws-table-position
The last method was faster by a considerable margin [70%].
On Wednesday, June 6, 2018 at 4:05:26 PM UTC+12, Kellie Fitton wrote:
On Tuesday, June 5, 2018 at 3:11:51 PM UTC-7, Kerry Liles wrote:
On 6/5/2018 4:25 PM, Richard wrote:
On Wednesday, June 6, 2018 at 3:26:14 AM UTC+12, Kellie Fitton wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The
ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
"""The table needs to be initialized (formatted) prior to being populated"""
NO IT DOES NOT. You seem to be incredibly resistant to advice.
'Initialization' does NOT 'format' the data area. The 'format' of the table items is set by picture clauses during the compile.
In fact, low-values may not be valid in table-member, depending on implementation, because the final nibble will be the code for the sign value. This actually doesn't matter because you will be moving a valid number to it before it is used anyway.
If you are populating the table sequentially, and thus moving data into both fields for every occurrence up to ws-table-counter, then the 'initialization' is just a waste of time. The resulting table will be identical without it.
"""The initialization is done As-Needed-Only for each table-row."""
That is NOT what your code is doing. Your code is moving low-values from byte 1 of the whole table up to the current limit of the table. If you are executing this code for each data item then you are overwriting the data already in the table each time.
Your code seems to change each time you post, and you seem to get it wrong each time. Now the code has ws-table-items with the occurs (which may be an improvement) rather than a subsiduary filler field with the occurs.
At the very least you can now speed up the 'initialization' by simply doing:
add 1 to ws-counter
move low-values to ws-table-items(ws-counter)
which will avoid overwriting all the current data items already loaded and will be faster than reference notation. But it is still a complete waste.
Another improvement is to 'move low-values to ws-repository' before loading any data (as was suggested by Kerry). This is likely to be much faster than doing it item by item because of the overhead of using a subscript and of doing thousands of moves rather than just one.
Roger the above.
I suggest that you post code that has been compiled and TESTED rather than just making up more stuff on the fly and getting it wrong.
You should be testing the speed of each of these methods and also seeing that the results match what you expect. Why aren't you doing that?
+1 to this.
I would like to take a step back. You opened a thread with 'Can mighty Cobol carry an elephant'. That is a brilliant title and you have engaged much discussion. While I may soften Richards remarks you did leave a lot of gaps in the explanation.Richard,
I already tested and compared several sets of initialization
methods before posting my question. The Initialize process was
conducted with the following methods:
initialize ws-repository
move low-values to ws-repository
perform varying loop
move spaces to table-plan
move zeros to table-member
end-perform
calculate ws-table-position
The last method was faster by a considerable margin [70%].
Then I would posit that your description of what your code does is simply not true.
The 'calculate' code would move low-values to the table from byte 1 for the number of entries to whatever the ws-table-count held. You claimed that you did:
"""The initialization is done As-Needed-Only for each table-row."""
and previously you had claimed:
"""The initialization logic is: [format-table-items-as-needed-only].
It will initialize the first occurrence in the table, then after
putting something in the first-position-of-the-table (1), the
next position will be initialized when the table have the second
occurrence and ready to get populated with data for position (2)"""
It is as if you are unaware of what you are actually doing in the code.
It may well be that the 'calculate ws-table-position' is 'faster', especially when ws-table-counter is zero, as it will be in your code sample, because it will do nothing.
If ws-table-counter is > zero and you are doing this 'for each table-row' as you load the data then it is overwriting the data already loaded.
Get your act together and work out what your code really is and what it is supposed to be doing; post actual code from your compiled program instead of retyping what you guess it to be; and stop wasting everyone's time.
On Tuesday, June 5, 2018 at 5:19:58 PM UTC-7, Clark F Morris wrote:
On Tue, 5 Jun 2018 08:26:12 -0700 (PDT), Kellie Fitton
<KELLIEFITTON@yahoo.com> wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:If both table-plan and table-member are filled in then the
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The
ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
initialization is a waste of computer cycles since the filling of
table-plan will overwrite the low-values in position 1 of table-plan.
Clark Morris
Clark,
As I have mentioned in this thread previously, the initialize
process must happen periodically to reset, refresh the table
and prepare it for a new set of replacement data. The rest
and initialize process is much faster when done based on the
number of entries in the table: [calculate table-position].
On Tuesday, June 5, 2018 at 7:24:30 PM UTC-7, pete dashwood wrote:
On 3/06/2018 9:53 PM, Richard wrote:
On Sunday, June 3, 2018 at 7:28:19 PM UTC+12, pete dashwood wrote:Thanks for that, Richard. It appears that random ISAM access may be
In terms of execution efficiency of the ISAM solution, it comes down
largely to how much of the file you can buffer in memory, but if you ran >>>> a benchmark I think you would be agreeably surprised by the speed of it. >>>> The actual processing logic is certainly much simpler than manipulating >>>> and initializing your table, if the table is truly "large".
ISAM lookups may be 10 times _slower_ than a table SEARCH ALL.
I just did a benchmark on a slow system. 5,000,000 SEARCH ALLs on a 50,000 sized table is < 2 seconds. The same number of ISAM reads on the same data takes 20 sec. Load of the table from the ISAM file is insignificant.
YMMV
worse than I would have expected...
That reinforces the case for loading the table and then using the table
for random retrieval (where this makes sense to do, of course.)
I noted (and completely agree with) comments by you and Clark under
Kellie's post.
There seems to be some fundamental mis-understanding about
"initializing" then overwriting.
Hopefully, the posts have helped to clear it up.
Pete.
--
I used to write COBOL; now I can do anything...
Pete,
As mentioned in my question: the table needs to get refreshed
and reset Constantly. The initialize process must REMOVE the
old data from the table, then the NEW SET OF FRESH DATA will
re-populate the table periodically. Hence, re-initialize...
As a business analyst, I would want to know more about the application to look at why this is necessary. There may be some other solution.
Greg
Why not just reset ws-table-counter and possibly table-index,
depending on what you're using to store newly read records?
Louis
As others have pointed out, there is no need to clear the table if your program simply keeps track of the 'currently' highest used entry in the table (setting that back to 0 or 1 or whatever you like to use as the
first entry effectively "clears" the table)
As a business analyst, I would want to know more about the application to look at why this is necessary. There may be some other solution.
Greg
Greg,You haven't answered the question that Greg asked which is: "why this is necessary". You have been told, several times, that it is not, and yet you continue to 'refine and optimize the unrequired'.
The initialize process resets the table before a new set of
date re-populate the table. Moving low-values to the table
with the calculated table position works very well and fast.
It clears only the number of data entry occurrences without
initializing the entire table [max size].
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
On Wednesday, 6 June 2018 15:20:29 UTC+10, Richard wrote:
On Wednesday, June 6, 2018 at 4:05:26 PM UTC+12, Kellie Fitton wrote:
On Tuesday, June 5, 2018 at 3:11:51 PM UTC-7, Kerry Liles wrote:
On 6/5/2018 4:25 PM, Richard wrote:
On Wednesday, June 6, 2018 at 3:26:14 AM UTC+12, Kellie Fitton wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The
ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
"""The table needs to be initialized (formatted) prior to being populated"""
NO IT DOES NOT. You seem to be incredibly resistant to advice.
'Initialization' does NOT 'format' the data area. The 'format' of the table items is set by picture clauses during the compile.
In fact, low-values may not be valid in table-member, depending on implementation, because the final nibble will be the code for the sign value. This actually doesn't matter because you will be moving a valid number to it before it is used anyway.
If you are populating the table sequentially, and thus moving data into both fields for every occurrence up to ws-table-counter, then the 'initialization' is just a waste of time. The resulting table will be identical without it.
"""The initialization is done As-Needed-Only for each table-row."""
That is NOT what your code is doing. Your code is moving low-values from byte 1 of the whole table up to the current limit of the table. If you are executing this code for each data item then you are overwriting the data already in the table each time.
Your code seems to change each time you post, and you seem to get it wrong each time. Now the code has ws-table-items with the occurs (which may be an improvement) rather than a subsiduary filler field with the occurs.
At the very least you can now speed up the 'initialization' by simply doing:
add 1 to ws-counter
move low-values to ws-table-items(ws-counter)
which will avoid overwriting all the current data items already loaded and will be faster than reference notation. But it is still a complete waste.
Another improvement is to 'move low-values to ws-repository' before loading any data (as was suggested by Kerry). This is likely to be much faster than doing it item by item because of the overhead of using a subscript and of doing thousands of moves rather than just one.
Roger the above.
I suggest that you post code that has been compiled and TESTED rather than just making up more stuff on the fly and getting it wrong.
You should be testing the speed of each of these methods and also seeing that the results match what you expect. Why aren't you doing that?
+1 to this.
Richard,
I already tested and compared several sets of initialization
methods before posting my question. The Initialize process was
conducted with the following methods:
initialize ws-repository
move low-values to ws-repository
perform varying loop
move spaces to table-plan
move zeros to table-member
end-perform
calculate ws-table-position
The last method was faster by a considerable margin [70%].
Then I would posit that your description of what your code does is simply not true.
The 'calculate' code would move low-values to the table from byte 1 for the number of entries to whatever the ws-table-count held. You claimed that you did:
"""The initialization is done As-Needed-Only for each table-row."""
and previously you had claimed:
"""The initialization logic is: [format-table-items-as-needed-only].
It will initialize the first occurrence in the table, then after
putting something in the first-position-of-the-table (1), the
next position will be initialized when the table have the second
occurrence and ready to get populated with data for position (2)"""
It is as if you are unaware of what you are actually doing in the code.
It may well be that the 'calculate ws-table-position' is 'faster', especially when ws-table-counter is zero, as it will be in your code sample, because it will do nothing.
If ws-table-counter is > zero and you are doing this 'for each table-row' as you load the data then it is overwriting the data already loaded.
Get your act together and work out what your code really is and what it is supposed to be doing; post actual code from your compiled program instead of retyping what you guess it to be; and stop wasting everyone's time.
I would like to take a step back. You opened a thread with 'Can mighty Cobol carry an elephant'. That is a brilliant title and you have engaged much discussion. While I may soften Richards remarks you did leave a lot of gaps in the explanation.
Discussion about in memory tables and initialization somewhat bores me.
As a business analyst, I would want to know more about the application to look at why this is necessary. There may be some other solution.
On Wednesday, June 6, 2018 at 12:13:45 AM UTC-7, Louis Krupp wrote:
Why not just reset ws-table-counter and possibly table-index,
depending on what you're using to store newly read records?
Louis
Louis,
I reset the ws-table-counter back to 1 and it did clear
the old entries from the table. Thanks...
Now the code has ws-table-items with the occurs (which
may be an improvement) rather than a subsiduary filler field with the
occurs.
The ws-table-position variable should be 9 digits, I was typing fast to >explain the process logic while talking on my cellphone.
When I posted my question I said: the table needs to be
initialized CONSTANTLY.
this thread Previously, the table needs to get Refreshed
and Reset PERIODICALLY before a new set of fresh data
can re-populate the table.
On Tuesday, June 5, 2018 at 3:11:51 PM UTC-7, Kerry Liles wrote:
On 6/5/2018 4:25 PM, Richard wrote:
On Wednesday, June 6, 2018 at 3:26:14 AM UTC+12, Kellie Fitton wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The
ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
"""The table needs to be initialized (formatted) prior to being populated"""
NO IT DOES NOT. You seem to be incredibly resistant to advice.
'Initialization' does NOT 'format' the data area. The 'format' of the table items is set by picture clauses during the compile.
In fact, low-values may not be valid in table-member, depending on implementation, because the final nibble will be the code for the sign value. This actually doesn't matter because you will be moving a valid number to it before it is used anyway.
If you are populating the table sequentially, and thus moving data into both fields for every occurrence up to ws-table-counter, then the 'initialization' is just a waste of time. The resulting table will be identical without it.
"""The initialization is done As-Needed-Only for each table-row."""
That is NOT what your code is doing. Your code is moving low-values from byte 1 of the whole table up to the current limit of the table. If you are executing this code for each data item then you are overwriting the data already in the table each time.
Your code seems to change each time you post, and you seem to get it wrong each time. Now the code has ws-table-items with the occurs (which may be an improvement) rather than a subsiduary filler field with the occurs.
At the very least you can now speed up the 'initialization' by simply doing:
add 1 to ws-counter
move low-values to ws-table-items(ws-counter)
which will avoid overwriting all the current data items already loaded and will be faster than reference notation. But it is still a complete waste.
Another improvement is to 'move low-values to ws-repository' before loading any data (as was suggested by Kerry). This is likely to be much faster than doing it item by item because of the overhead of using a subscript and of doing thousands of moves rather than just one.
Roger the above.
I suggest that you post code that has been compiled and TESTED rather than just making up more stuff on the fly and getting it wrong.
You should be testing the speed of each of these methods and also seeing that the results match what you expect. Why aren't you doing that?
+1 to this.
Richard,
I already tested and compared several sets of initialization
methods before posting my question. The Initialize process was
conducted with the following methods:
initialize ws-repositoryI have done some testing of your 'calculate ws-table-position' and can't get it significantly faster than 'move low-values to ws-repository' without reducing the number of entries that it clears to being much smaller numbers. To get it 70% faster would require only clearing about a third or less of the table. So how are you determining the number that does need to be cleared? What number did you use in your test?
move low-values to ws-repository
perform varying loop
move spaces to table-plan
move zeros to table-member
end-perform
calculate ws-table-position
The last method was faster by a considerable margin [70%].
In article <d630c630-3881-4a87-881a-a6c7a55aaf55@googlegroups.com>,
Kellie Fitton <KELLIEFITTON@yahoo.com> wrote:
[snip]
When I posted my question I said: the table needs to be
initialized CONSTANTLY.
This is, at best, loose terminology. If something is being done
constantly then nothing else is being done.
Also, as I have mentioned in
this thread Previously, the table needs to get Refreshed
and Reset PERIODICALLY before a new set of fresh data
can re-populate the table.
Sorry, there's no time to refresh and reset... something else is being
done constantly, remember?
I suggest we start afresh. Assuming that the program to which you are referring has already been written:
1) What does the program currently do?
2) What should the program be doing better?
Assuming that the program has not already been written:
0) What is the program going to do?
DD
On Wednesday, June 6, 2018 at 6:46:45 AM UTC-7, Kerry Liles wrote:
As others have pointed out, there is no need to clear the table if your
program simply keeps track of the 'currently' highest used entry in the
table (setting that back to 0 or 1 or whatever you like to use as the
first entry effectively "clears" the table)
Kerry,
I reset the ws-table-counter back to 1 and it did clear
the old entries from the table. Thanks...
On Wednesday, June 6, 2018 at 6:46:45 AM UTC-7, Kerry Liles wrote:
As others have pointed out, there is no need to clear the table if your
program simply keeps track of the 'currently' highest used entry in the
table (setting that back to 0 or 1 or whatever you like to use as the
first entry effectively "clears" the table)
Kerry,
I reset the ws-table-counter back to 1 and it did clear
the old entries from the table. Thanks...
On Thursday, 7 June 2018 06:08:53 UTC+10, docd...@panix.com wrote:
In article <d630c630-3881-4a87-881a-a6c7a55aaf55@googlegroups.com>,
Kellie Fitton <KELLIEFITTON@yahoo.com> wrote:
[snip]
When I posted my question I said: the table needs to be
initialized CONSTANTLY.
This is, at best, loose terminology. If something is being done
constantly then nothing else is being done.
Also, as I have mentioned in
this thread Previously, the table needs to get Refreshed
and Reset PERIODICALLY before a new set of fresh data
can re-populate the table.
Sorry, there's no time to refresh and reset... something else is being
done constantly, remember?
I suggest we start afresh. Assuming that the program to which you are
referring has already been written:
1) What does the program currently do?
2) What should the program be doing better?
Assuming that the program has not already been written:
0) What is the program going to do?
DD
Kellie, you are sending everyone into a spin and I see many trying to help.
You have some master data that is constantly updated. Why? What is the nature of it.
An example of a different approach could be that the master data needs an alternate key.
On Tuesday, June 5, 2018 at 3:22:54 PM UTC-7, Richard wrote:
On Wednesday, June 6, 2018 at 3:26:14 AM UTC+12, Kellie Fitton wrote:
On Monday, June 4, 2018 at 1:33:53 PM UTC-7, Richard wrote:
The question still arises: why are you bothering to initialize the fields that you are going to overwrite ?
Is it because you wrongly think that a MOVE "stops when the end of the shortest field is encountered" and thus might leave junk in the receiving field ?
"ready to get populated with data"
Why do you think happens that makes the entry "ready" other than just incrementing the ODO ? In an ODO table _all_ the entries, all 53000 of them exist all the time as defined. The only thing that ODO adds is setting a virtual upper bound check. If you are going to be moving data to all the subfields then 'initialization' adds nothing.
Your question was: "What is the most optimized method to initialize a mammoth table?".
The answer is, in the case you describe with ODO: Don't bother with initializing when the initialization just gets completely overwritten.
Richard,
The table needs to be initialized (formatted) prior to being
populated with the data collected from the master file. The
initialization is done As-Needed-Only for each table-row. The
ws-table-counter has the higher position in the repository
table effectively occupied when the initialization is going
to be made. Below is the code that calculates the new table
position prior to the data population into the table-row.
01 ws-table-counter pic 9(5) comp-5 value 0.
01 ws-table-position pic 9(5) comp-5 value 0.
01 ws-repository.
03 ws-table-items occurs 1 to 53000 times depending on
ws-table-counter
ascending table-plan
indexed by table-index.
05 table-plan pic x(8).
05 table-member pic s9(9)v99 comp-3.
compute ws-table-position =
(length of ws-table-items * ws-table-counter)
end-compute
move low-values to ws-repository (1:ws-table-position)
Just to show that I am capable of running tests and timing them, which is what you should be doing, I have done 'initializing' a table 3 ways, the results are:
Each time is for 10,000 repeats and for 50,000 entries
1. move low-values to ws-repository : 0.12 seconds
2. move low-values to ws-table-items(index) : 7.2 seconds
3. move low-values to ws-repository(calculated:length of entry) : 59 seconds
So, not only is your code wrong but it is the worst way of doing the initialization by a factor of 48000%.
Your original code was at least 6000% slower than the best.
And it doesn't need to be done anyway.
Also, as yet another failure, ws-table-position is only 5 digits (pic 9(5)) and this will overflow when you multiply 14 * 53000, or 14 * any number more than 7142.
The ws-table-position variable should be 9 digits, I was typing fast to explain the process logic while talking on my cellphone.
In article <7621f8ab-def8-4886-9363-8bfa8993dd39@googlegroups.com>,
Kellie Fitton <KELLIEFITTON@yahoo.com> wrote:
[snip]
The ws-table-position variable should be 9 digits, I was typing fast to
explain the process logic while talking on my cellphone.
Don't worry about not paying any attention to work-related questions you
are asking others to assist you with for free. I, for one, am paying double-less attention in my responses.
DD
In article <7621f8ab-def8-4886-9363-8bfa8993dd39@googlegroups.com>,
Kellie Fitton <KELLIEFITTON@yahoo.com> wrote:
[snip]
The ws-table-position variable should be 9 digits, I was typing fast to
explain the process logic while talking on my cellphone.
Don't worry about not paying any attention to work-related questions you
are asking others to assist you with for free. I, for one, am paying double-less attention in my responses.
DD
The ws-table-position variable should be 9 digits, I was typing fast to >explain the process logic while talking on my cellphone.Please say the cell phone conversation was about COBOL.
On Tue, 5 Jun 2018 20:38:02 -0700 (PDT), Kellie Fitton <KELLIEFITTON@yahoo.com> wrote:
The ws-table-position variable should be 9 digits, I was typing fast to >explain the process logic while talking on my cellphone.
Please say the cell phone conversation was about COBOL.
Louis
On Thursday, June 7, 2018 at 12:33:22 AM UTC-7, Louis Krupp wrote:
On Tue, 5 Jun 2018 20:38:02 -0700 (PDT), Kellie Fitton <KELLIEFITTON@yahoo.com> wrote:
The ws-table-position variable should be 9 digits, I was typing fast to >explain the process logic while talking on my cellphone.
Please say the cell phone conversation was about COBOL.
Louis
Hi Louis,
As a matter of fact, it was. However, it was some bad news about my
favorite programming language---COBOL. It is making me angry and
emotional to say the least. I will explain why in a new post shortly.
I would like to hear your unbiased opinion, though. Thanks...
On Thursday, 7 June 2018 18:02:59 UTC+10, Kellie Fitton wrote:
On Thursday, June 7, 2018 at 12:33:22 AM UTC-7, Louis Krupp wrote:
On Tue, 5 Jun 2018 20:38:02 -0700 (PDT), Kellie Fitton <KELLIEFITTON@yahoo.com> wrote:
The ws-table-position variable should be 9 digits, I was typing fast to >explain the process logic while talking on my cellphone.
Please say the cell phone conversation was about COBOL.
Louis
Hi Louis,
As a matter of fact, it was. However, it was some bad news about my favorite programming language---COBOL. It is making me angry and
emotional to say the least. I will explain why in a new post shortly.
I would like to hear your unbiased opinion, though. Thanks...
Sorry to hear that Kellie and I look forward to a new post.
If I was your Boss, I would not be questioning the COBOL language but the why. Why do you need a binary search on a constantly varying table? Why is it constantly varying? Is there a better way of doing it?
Greg
Why do you need a binary search on a constantly varying table?
Why is it constantly varying? Is there a better way of doing it?
On Thursday, June 7, 2018 at 1:26:22 AM UTC-7, Greg Wallace wrote:
If I was your Boss, I would not be questioning the COBOL language butthe why. Why do you need a binary search on a constantly varying table?
Why is it constantly varying? Is there a better way of doing it?
Hi Greg,
First, one of my programs function as a sifting thread, it will
collect certain data from a master file based on some qualifying
criteria, patterns and relevant information.
The collected data
are loaded into the table temporarily for the purpose of lookup
and comparison against counterpart data mined from another file.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 991 |
Nodes: | 10 (0 / 10) |
Uptime: | 82:04:29 |
Calls: | 12,949 |
Calls today: | 3 |
Files: | 186,574 |
Messages: | 3,264,681 |