by Matt Leetz I’ve been working in the Critical Food Studies Lab for the past two semesters, or my entire senior year. Over this time, I’ve been building a mathematical spreadsheet model to analyze the presence and intensities of Food Desert regions in the Chicagoland Metropolitan Statistical Area, which covers the area of Chicago as well as its suburbs expanding into Indiana and Wisconsin. I’m interested in utilizing geographic problem solving methods and applying them to human geography, in the hope that we can better understand how these problems come to be and aid in the quest towards food security for all Americans. The end product of this project will be an interactive data model overlaid on a map of the area that highlights where food insecurity is most intense and provide qualitative information about the food vendors that do exist. Last semester was spent collecting data to put into the model. There are five matrices involved in the Maximal Covering Location Problem. We are using the MCLP, introduced by Church and ReVelle in 1974, to solve a discrete (accessible by the network of roads) problem of coverage by locating food facilities within a maximum allowed distance to customers. We set this as an arbitrary drive-time of 20 minutes, which we can increase or decrease as our independent variable. The first matrix is a distance matrix, where we have calculated drive times between all of the Census blocks in the Chicagoland MSA. This by far took the most time to prepare, as the workflow handed down to me involved manually querying each address, receiving a JSON file and extracting information from that into the format we needed. Luckily, we were able to use ArcMap’s Network Analyst function instead, which cut the workload from ~400 hours to 20 minutes. Next comes the population matrix. I was able to collect population data from the US Census Bureau in the form of shapefiles, a filetype used by Geographic Information Software such as ArcMap and QGIS. I was also able to find population centroid data, or the geographically weighted mean of the population of each Census block, and apply it to the shapefiles. With those two out of the way, the other matrices needed are the Constraint matrix (maximum allowed drive time), the Assignment matrix (a way of applying the many internal constraints necessary for the model to function), and the Output matrix (a blank matrix growing impatient to be filled). These were easy to create in Excel this semester, and I have since been cleaning and perfecting the model. This model has nearly 2200 constraints (which I was able to create through a Python script) and 4.7 million variables, making it extremely taxing on computers. We have a specialized computer in the Food Institute now dedicated to running this model as it was built, but at its current rate will take 3-5 months to finish the process. I am now attacking this problem from an alternative route: using Matlab code to perform this process through Mixed Integer Non-Linear Programming, which we can then run on the IU supercomputing cluster through remote access.
37 Comments
My work in the Critical Food Studies Lab has revolved around understanding food deserts in Central Indiana. A large part of my work is “tool-based”, meaning I apply many different Geographic Information Systems (GIS) techniques in order to discover and illuminate trends not easily appreciated within large datasets. Additionally, my work revolves around applying mathematical principles to the challenge of food deserts. Overall my work is data based, with large importance being placed on the interpretation and manipulation of data. Much of the work I performed this fall was concerned with data mining and database building. Geographically, my research location consists of a two-county ring surrounding Marion County, Indiana. In these counties, census tract data were collected concerning population sizes and each census tracts geographically weighted center. From the population data, a “W-Matrix” was built to address the population size of each census tract. From the centroid data, a “D-Matrix” was built to analyze the time it would take to drive from each centroid to every other centroid. In order to create the D-Matrix, I used a network analysis tool in ArcMap (the program I use to perform my GIS tasks). In ArcMap, there is a series of network analysis tools that perform a variety of functions. As the name suggests, each network analysis function is dependent on a “network” of sorts in order to perform each said analysis. For my work, I used a “network” of streets and centerlines consisting of every street and centerline in North America. For my analysis, I elected to use an OD cost matrix. This tool is ideal for the D-Matrix due to the values it accumulates. The OD cost matrix accumulates both time and distance values between two points. The OD cost matrix draws “lines” based on the network given accumulating data on the distance and time it would take to traverse these “lines”. The lines form a matrix but they are in actuality routes that carry data, very similar to the routes seen in Google Maps for example. In ArcMap, all constraints were lifted with the exception of what vehicle I would use. Each time was calculated using a single occupancy vehicle. And so through performing the OD cost matrix a network of lines was created. A line is produced traveling from each centroid to every other centroid. With a total of 530 census tracts, the OD cost matrix consisted of 280,900 lines. Now that’s a lot of lines! From those lines, I extracted all the time data and created a time database within Excel. From the time database, all the data is synthesized into the D-Matrix. The D-Matrix is rather large; keep in mind its 530 by 530 census tracts, a total of 280,900 data points. In order to synthesize the D-Matrix in a quick and easy manner, I used an OFFSET function within Excel. I also tried to build and use a macro for this but found the OFFSET function to be simpler and much more user-friendly. For those unfamiliar with Excel or the OFFSET function, it is a tool that moves columns or rows of data into a matrix. Rather than copying and pasting 530 columns of data, the OFFSET formula will do it quickly and easily. If you are having trouble moving columns of data into a matrix, much like I was, here is an example of the formula I ended up creating: =OFFSET(drv_tms!$F$2:$F$279842,COLUMN()-COLUMN($D$4)+((ROW()-ROW($D$4))*(ROWS(drv_tms!$F$2:$F$279842)/529)),0,1,1) From the D-Matrix I then created an “S-Matrix” which is concerned with various time limits or constraints The S-Matrix is a binary analysis of the D-Matrix. It analyzes the time it takes to travel from each centroid under certain time limits. The time limits I used for the S-Matrix were 5, 10, 15, 20, 25, 30, and 35 minutes. Again the S-Matrix is binary, meaning, for example, with the 5-minute time constraint, if the travel time was under 5 minutes a “1” was used to represent that line. If the travel time was over 5 minutes a 0 was used. Binary for this example is just true and false, 1 being true and 0 being false. From these three matrices, my next step is to create covering models and running said models through solver. by Hannah Gruber My work with the Critical Food Studies Lab has been working with the lab’s Co-director, Angela Babb, and her project with the Thrifty Food Plan. The Thrifty Food Plan (TFP) provides the basis for food stamp allotments given by the Supplemental Nutrition Assistance Program (SNAP). It uses an algorithm to determine market baskets containing sufficient nutrients for a healthy person based on age and sex. It has become increasingly clear that the allotments are not enough to cover the nutritious diet suggested by the TFP, especially for those who have specific diet restrictions such as allergies or intolerances. Previous members of the Lab made a new model based on George Stigler’s calculations, proving that it is not possible to achieve nutrient recommendations at a minimal cost. Unfortunately, the data from this model is outdated and from early 2000s data that the 2006 TFP uses. During my time in this lab, I’ve been working on recreating this model with updated nutrient profiles, food groups, and upper and lower limits for each nutrient. Some of this data comes from the USDA’s 2015-2020 Dietary Guidelines for Americans (DGAs) where they provide information beneficial for policymakers and health professionals to influence Americans’ diets. I hit a few bumps along the way, such as struggling to find data that fit the criteria needed for the new model. The USDA’s new DGAs reduced the food groups from 58 groups in the 2006 TFP model to just 19 food groups. An example of one of the updated food groups is Vegetables which includes subgroups Dark Green (broccoli), Red-orange (carrots), Beans and Peas, Starches (corn), and Other (cauliflower). The upper and lower limits given in the new DGAs are largely the same as the ones used in the 2006 TFP and reflect a range of data from the late 1990s through early 2000s. The limits give a range of safe levels of nutrients that can be consumed without negative health effects. Some of the nutrients don’t have any data on limits so we are assuming there are not substantial consequences from low or high intakes. I finally managed to combine all of this information into a new spreadsheet and will be moving on from here to something slightly new. This summer I plan to continue working with the TFP, but this time with the market baskets. I will go to all of the food outlets in Bloomington and attempt to purchase the market baskets within the amount allotted by the TFP. I’m looking forward to continuing my work with the TFP, but on a local level. I am hoping to find some interesting information on Bloomington’s grocery prices and whether or not any of the stores will be able to provide a (not so thrifty) food plan. Hopefully my work will contribute more proof that people using SNAP are not given enough to sustain a nutritious diet. |
AuthorSThis blog features the current work of CFS Lab researchers in their own words. Archives
October 2021
Categories |