Open a .osm.pbf file with fiona in python

I would like to open a .osm.pbf using fiona in Python. I can't find much documentation on this. How can I does one do this?

I have done it using ogr2ogr.

Fiona is by design restricted to the conventional record model of data, i.e. all records (features) have the same fields associated with them. This means that Fiona reads shapefiles, but does not read more flexible formats such as the OSM PBF format.

You can check which drivers are supported in Fiona with:

import fiona list(fiona.drivers)

You have two options then: use the OGR Python drivers to read the data, or to useogr2ogrto convert the data to a format that Fiona can read. I think the second option is your best bet as I find Fiona much easier to use.

Adding a background map to plots¶

This example shows how you can add a background basemap to plots created with the geopandas .plot() method. This makes use of the contextily package to retrieve web map tiles from several sources (OpenStreetMap, Stamen). Also have a look at contextily’s introduction guide for possible new features not covered here.

Let’s use the NYC borough boundary data that is available in geopandas datasets. Plotting this gives the following result:

15 Python Libraries for GIS and Mapping

Python libraries are the ultimate extension in GIS because it allows you to boost its core functionality.

By using Python libraries, you can break out of the mold that is GIS and dive into some serious data science.

There are 200+ standard libraries in Python. But there are thousands of third-party libraries too. So, it’s endless how far you can take it.

Today, it’s all about Python libraries in GIS. Specifically, what are the most popular Python packages that GIS professionals use today? Let’s get started.

First, why even use Python libraries for GIS?

Have you ever noticed how GIS is missing that one capability you need it to do? Because no GIS software can do it all, Python libraries can add that extra functionality you need.

Put simply, a Python library is code someone else has written to make life easier for the rest of us. Developers have written open libraries for machine learning, reporting, graphing, and almost everything in Python.

If you want this extra functionality, you can leverage those libraries by importing them into your Python script. From here, you can call functions that aren’t natively part of your core GIS software.

PRO TIP: Use pip to install and manage your packages in Python

Python Libraries for GIS

If you’re going to build an all-star team for GIS Python libraries, this would be it. They all help you go beyond the typical managing, analyzing, and visualizing of spatial data. That is the true definition of a geographic information system.

1 . Arcpy

If you use Esri ArcGIS, then you’re probably familiar with the ArcPy library. ArcPy is meant for geoprocessing operations. But it’s not only for spatial analysis, it’s also for data conversion, management, and map production with Esri ArcGIS.

2 . Geopandas

Geopandas is like pandas meet GIS. But instead of straight-forward tabular analysis, the geopandas library adds a geographic component. For overlay operations, Geopandas uses Fiona and Shapely, which are Python libraries of their own.


The GDAL/OGR library is used for translating between GIS formats and extensions. QGIS, ArcGIS, ERDAS, ENVI, and GRASS GIS and almost all GIS software use it for translation in some way. At this time, GDAL/OGR supports 97 vector and 162 raster drivers.

4 . RSGISLib

The RSGISLib library is a set of remote sensing tools for raster processing and analysis. To name a few, it classifies, filters, and performs statistics on imagery. My personal favorite is the module for object-based segmentation and classification (GEOBIA).

5 . PyProj

The main purpose of the PyProj library is how it works with spatial referencing systems. It can project and transform coordinates with a range of geographic reference systems. PyProj can also perform geodetic calculations and distances for any given datum.

Python Libraries for Data Science

Data science extracts insights from data. It takes data and tries to make sense of it, such as by plotting it graphically or using machine learning. This list of Python libraries can do exactly this for you.

6 . NumPy

Numerical Python (NumPy library) takes your attribute table and puts it in a structured array. Once it’s in a structured array, it’s much faster for any scientific computing. One of the best things about it is how you can work with other Python libraries like SciPy for heavy statistical operations.

7 . Pandas

The Pandas library is immensely popular for data wrangling. It’s not only for statisticians. But it’s incredibly useful in GIS too. Computational performance is key for pandas. The success of Pandas lies in its data frame. Data frames are optimized to work with big data. They’re optimized to such a point that it’s something that Microsoft Excel wouldn’t even be able to handle.

8 . Matplotlib

When you’re working with thousands of data points, sometimes the best thing to do is plot it all out. Enter matplotlib. Statisticians use the matplotlib library for visual display. Matplotlib does it all. It plots graphs, charts, and maps. Even with big data, it’s decent at crunching numbers.

9 . Scikit

Lately, machine learning has been all the buzz. And with good reason. Scikit is a Python library that enables machine learning. It’s built into NumPy, SciPy, and Matplotlib. So, if you want to do any data mining, classification or ML prediction, the Scikit library is a decent choice.

10 . Re (regular expressions)

Regular expressions (Re) are the ultimate filtering tool. When there’s a specific string you want to hunt down in a table, this is your go-to library. But you can take it a bit further like detecting, extracting and replacing with pattern matching.

11 . ReportLab

ReportLab is one of the most satisfying libraries on this list. I say this because GIS often lacks sufficient reporting capabilities. Especially, if you want to create a report template, this is a fabulous option. I don’t know why the ReportLab library falls a bit off the radar because it shouldn’t.

12 . ipyleaflet

If you want to create interactive maps, ipyleaflet is a fusion of Jupyter notebook and Leaflet. You can control an assortment of customizations like loading basemaps, geojson, and widgets. It also gives a wide range of map types to pick from including choropleth, velocity data, and side-by-side views.

13 . Folium

Just like ipyleaflet, Folium allows you to leverage leaflet to build interactive web maps. It gives you the power to manipulate your data in Python, then you can visualize it with the leading open-source JavaScript library.

14 . Geemap

Geemap is intended more for science and data analysis using Google Earth Engine (GEE). Although anyone can use this Python library, scientists and researchers specifically use it to explore the multi-petabyte catalog of satellite imagery in GEE for their specific applications and uses with remote sensing data.

15 . LiDAR

Simply named the LiDAR Python Package, the purpose is to process and visualize Light Detection and Ranging (LiDAR) data. For example, it includes tools to smooth, filter, and extract topological properties from digital elevation models (DEMs) data. Althought I don’t see integration with raw LAS files, it serves its purpose for terrain and hydrological analysis.

PRO TIP: If you need a quick and dirty list of functions for Python libraries, check out DataCamp’s Cheat Sheets.

The Python Libraries All-Star Team

These are the Python libraries we thought were stand-outs for GIS and data science.

Now, it’s time to turn it on to you.

If you could build an all-star team of Python libraries, who would you put on your team?


To communicate information clearly and efficiently, data visualization uses statistical graphics, plots, information graphics and other tools. Numerical data may be encoded using dots, lines, or bars, to visually communicate a quantitative message. [6] Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable, and usable. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look up a specific measurement, while charts of various types are used to show patterns or relationships in the data for one or more variables.

Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects (e.g., points, lines, or bars) contained in graphics. The goal is to communicate information clearly and efficiently to users. It is one of the steps in data analysis or data science. According to Vitaly Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn't mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information". [7]

Indeed, Fernanda Viegas and Martin M. Wattenberg suggested that an ideal visualization should not only communicate clearly, but stimulate viewer engagement and attention. [8]

Data visualization is closely related to information graphics, information visualization, scientific visualization, exploratory data analysis and statistical graphics. In the new millennium, data visualization has become an active area of research, teaching and development. According to Post et al. (2002), it has united scientific and information visualization. [9]

In the commercial environment data visualization is often referred to as dashboards. Infographics are another very common form of data visualization.

Characteristics of effective graphical displays Edit

Professor Edward Tufte explained that users of information displays are executing particular analytical tasks such as making comparisons. The design principle of the information graphic should support the analytical task. [11] As William Cleveland and Robert McGill show, different graphical elements accomplish this more or less effectively. For example, dot plots and bar charts outperform pie charts. [12]

In his 1983 book The Visual Display of Quantitative Information, Edward Tufte defines 'graphical displays' and principles for effective graphical display in the following passage: "Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Graphical displays should:

  • show the data
  • induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else
  • avoid distorting what the data has to say
  • present many numbers in a small space
  • make large data sets coherent
  • encourage the eye to compare different pieces of data
  • reveal the data at several levels of detail, from a broad overview to the fine structure
  • serve a reasonably clear purpose: description, exploration, tabulation, or decoration
  • be closely integrated with the statistical and verbal descriptions of a data set.

Graphics reveal data. Indeed graphics can be more precise and revealing than conventional statistical computations." [13]

For example, the Minard diagram shows the losses suffered by Napoleon's army in the 1812–1813 period. Six variables are plotted: the size of the army, its location on a two-dimensional surface (x and y), time, the direction of movement, and temperature. The line width illustrates a comparison (size of the army at points in time), while the temperature axis suggests a cause of the change in army size. This multivariate display on a two-dimensional surface tells a story that can be grasped immediately while identifying the source data to build credibility. Tufte wrote in 1983 that: "It may well be the best statistical graphic ever drawn." [13]

Not applying these principles may result in misleading graphs, distorting the message, or supporting an erroneous conclusion. According to Tufte, chartjunk refers to the extraneous interior decoration of the graphic that does not enhance the message or gratuitous three-dimensional or perspective effects. Needlessly separating the explanatory key from the image itself, requiring the eye to travel back and forth from the image to the key, is a form of "administrative debris." The ratio of "data to ink" should be maximized, erasing non-data ink where feasible. [13]

The Congressional Budget Office summarized several best practices for graphical displays in a June 2014 presentation. These included: a) Knowing your audience b) Designing graphics that can stand alone outside the report's context and c) Designing graphics that communicate the key messages in the report. [14]

Quantitative messages Edit

Author Stephen Few described eight types of quantitative messages that users may attempt to understand or communicate from a set of data and the associated graphs used to help communicate the message:

  1. Time-series: A single variable is captured over a period of time, such as the unemployment rate over a 10-year period. A line chart may be used to demonstrate the trend.
  2. Ranking: Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance (the measure) by sales persons (the category, with each sales person a categorical subdivision) during a single period. A bar chart may be used to show the comparison across the sales persons.
  3. Part-to-whole: Categorical subdivisions are measured as a ratio to the whole (i.e., a percentage out of 100%). A pie chart or bar chart can show the comparison of ratios, such as the market share represented by competitors in a market.
  4. Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart can show comparison of the actual versus the reference amount.
  5. Frequency distribution: Shows the number of observations of a particular variable for given interval, such as the number of years in which the stock market return is between intervals such as 0-10%, 11-20%, etc. A histogram, a type of bar chart, may be used for this analysis. A boxplot helps visualize key statistics about the distribution, such as median, quartiles, outliers, etc.
  6. Correlation: Comparison between observations represented by two variables (X,Y) to determine if they tend to move in the same or opposite directions. For example, plotting unemployment (X) and inflation (Y) for a sample of months. A scatter plot is typically used for this message.
  7. Nominal comparison: Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison. or geospatial: Comparison of a variable across a map or layout, such as the unemployment rate by state or the number of persons on the various floors of a building. A cartogram is a typical graphic used. [6][15]

Analysts reviewing a set of data may consider whether some or all of the messages and graphic types above are applicable to their task and audience. The process of trial and error to identify meaningful relationships and messages in the data is part of exploratory data analysis.

Visual perception and data visualization Edit

A human can distinguish differences in line length, shape, orientation, distances, and color (hue) readily without significant processing effort these are referred to as "pre-attentive attributes". For example, it may require significant time and effort ("attentive processing") to identify the number of times the digit "5" appears in a series of numbers but if that digit is different in size, orientation, or color, instances of the digit can be noted quickly through pre-attentive processing. [16]

Compelling graphics take advantage of pre-attentive processing and attributes and the relative strength of these attributes. For example, since humans can more easily process differences in line length than surface area, it may be more effective to use a bar chart (which takes advantage of line length to show comparison) rather than pie charts (which use surface area to show comparison). [16]

Human perception/cognition and data visualization Edit

Almost all data visualizations are created for human consumption. Knowledge of human perception and cognition is necessary when designing intuitive visualizations. [17] Cognition refers to processes in human beings like perception, attention, learning, memory, thought, concept formation, reading, and problem solving. [18] Human visual processing is efficient in detecting changes and making comparisons between quantities, sizes, shapes and variations in lightness. When properties of symbolic data are mapped to visual properties, humans can browse through large amounts of data efficiently. It is estimated that 2/3 of the brain's neurons can be involved in visual processing. Proper visualization provides a different approach to show potential connections, relationships, etc. which are not as obvious in non-visualized quantitative data. Visualization can become a means of data exploration.

Studies have shown individuals used on average 19% less cognitive resources, and 4.5% better able to recall details when comparing data visualization with text. [19]

There is no comprehensive 'history' of data visualization. There are no accounts that span the entire development of visual thinking and the visual representation of data, and which collate the contributions of disparate disciplines. [20] Michael Friendly and Daniel J Denis of York University are engaged in a project that attempts to provide a comprehensive history of visualization. Contrary to general belief, data visualization is not a modern development. Since prehistory, stellar data, or information such as location of stars were visualized on the walls of caves (such as those found in Lascaux Cave in Southern France) since the Pleistocene era. [21] Physical artefacts such as Mesopotamian clay tokens (5500 BC), Inca quipus (2600 BC) and Marshall Islands stick charts (n.d.) can also be considered as visualizing quantitative information. [22] [23]

The first documented data visualization can be tracked back to 1160 B.C. with Turin Papyrus Map which accurately illustrates the distribution of geological resources and provides information about quarrying of those resources. [24] Such maps can be categorized as thematic cartography, which is a type of data visualization that presents and communicates specific data and information through a geographical illustration designed to show a particular theme connected with a specific geographic area. Earliest documented forms of data visualization were various thematic maps from different cultures and ideograms and hieroglyphs that provided and allowed interpretation of information illustrated. For example, Linear B tablets of Mycenae provided a visualization of information regarding Late Bronze Age era trades in the Mediterranean. The idea of coordinates was used by ancient Egyptian surveyors in laying out towns, earthly and heavenly positions were located by something akin to latitude and longitude at least by 200 BC, and the map projection of a spherical earth into latitude and longitude by Claudius Ptolemy [c.85–c. 165] in Alexandria would serve as reference standards until the 14th century. [24]

The invention of paper and parchment allowed further development of visualizations throughout history. Figure shows a graph from the 10th or possibly 11th century that is intended to be an illustration of the planetary movement, used in an appendix of a textbook in monastery schools. [25] The graph apparently was meant to represent a plot of the inclinations of the planetary orbits as a function of the time. For this purpose, the zone of the zodiac was represented on a plane with a horizontal line divided into thirty parts as the time or longitudinal axis. The vertical axis designates the width of the zodiac. The horizontal scale appears to have been chosen for each planet individually for the periods cannot be reconciled. The accompanying text refers only to the amplitudes. The curves are apparently not related in time.

By the 16th century, techniques and instruments for precise observation and measurement of physical quantities, and geographic and celestial position were well-developed (for example, a “wall quadrant” constructed by Tycho Brahe [1546–1601], covering an entire wall in his observatory). Particularly important were the development of triangulation and other methods to determine mapping locations accurately. [20] Very early, the measure of time led scholars to develop innovative way of visualizing the data (e.g. Lorenz Codomann in 1596, Johannes Temporarius in 1596 [26] ).

French philosopher and mathematician René Descartes and Pierre de Fermat developed analytic geometry and two-dimensional coordinate system which heavily influenced the practical methods of displaying and calculating values. Fermat and Blaise Pascal's work on statistics and probability theory laid the groundwork for what we now conceptualize as data. [20] According to the Interaction Design Foundation, these developments allowed and helped William Playfair, who saw potential for graphical communication of quantitative data, to generate and develop graphical methods of statistics. [17]

In the second half of the 20th century, Jacques Bertin used quantitative graphs to represent information "intuitively, clearly, accurately, and efficiently". [17]

John Tukey and Edward Tufte pushed the bounds of data visualization Tukey with his new statistical approach of exploratory data analysis and Tufte with his book "The Visual Display of Quantitative Information" paved the way for refining data visualization techniques for more than statisticians. With the progression of technology came the progression of data visualization starting with hand-drawn visualizations and evolving into more technical applications – including interactive designs leading to software visualization. [27]

Programs like SAS, SOFA, R, Minitab, Cornerstone and more allow for data visualization in the field of statistics. Other data visualization applications, more focused and unique to individuals, programming languages such as D3, Python and JavaScript help to make the visualization of quantitative data a possibility. Private schools have also developed programs to meet the demand for learning data visualization and associated programming libraries, including free programs like The Data Incubator or paid programs like General Assembly. [28]

Beginning with the symposium "Data to Discovery" in 2013, ArtCenter College of Design, Caltech and JPL in Pasadena have run an annual program on interactive data visualization. [29] The program asks: How can interactive data visualization help scientists and engineers explore their data more effectively? How can computing, design, and design thinking help maximize research results? What methodologies are most effective for leveraging knowledge from these fields? By encoding relational information with appropriate visual and interactive characteristics to help interrogate, and ultimately gain new insight into data, the program develops new interdisciplinary approaches to complex science problems, combining design thinking and the latest methods from computing, user-centered design, interaction design and 3D graphics.

Data visualization involves specific terminology, some of which is derived from statistics. For example, author Stephen Few defines two types of data, which are used in combination to support a meaningful analysis or visualization:

  • Categorical: Represent groups of objects with a particular characteristic. Categorical variables can either be nominal or ordinal. Nominal variables for example gender have no order between them and are thus nominal. Ordinal variables are categories with an order, for sample recording the age group someone falls into. [30]
  • Quantitative: Represent measurements, such as the height of a person or the temperature of an environment. Quantitative variables can either be continuous or discrete. Continuous variables capture the idea that measurements can always be made more precisely. While discrete variables have only a finite number of possibilities, such as a count of some outcomes or an age measured in whole years. [30]

The distinction between quantitative and categorical variables is important because the two types require different methods of visualization.

Two primary types of information displays are tables and graphs.

  • A table contains quantitative data organized into rows and columns with categorical labels. It is primarily used to look up specific values. In the example above, the table might have categorical column labels representing the name (a qualitative variable) and age (a quantitative variable), with each row of data representing one person (the sampled experimental unit or category subdivision).
  • A graph is primarily used to show relationships among data and portrays values encoded as visual objects (e.g., lines, bars, or points). Numerical values are displayed within an area delineated by one or more axes. These axes provide scales (quantitative and categorical) used to label and assign values to the visual objects. Many graphs are also referred to as charts. [31]

Eppler and Lengler have developed the "Periodic Table of Visualization Methods," an interactive chart displaying various data visualization methods. It includes six types of data visualization methods: data, information, concept, strategy, metaphor and compound. [32]

  • length/count
  • category
  • color
  • Presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.
  • A bar graph shows comparisons among discretecategories. One axis of the chart shows the specific categories being compared, and the other axis represents a measured value.
  • Some bar graphs present bars clustered in groups of more than one, showing the values of more than one measured variable. These clustered groups can be differentiated using color.
  • For example comparison of values, such as sales performance for several persons or businesses in a single time period.

Variable-width ("variwide") bar chart

  • category (size/count/extent in first dimension)
  • size/count/extent in second dimension
  • size/count/extent as area of bar
  • color
  • Includes most features of basic bar chart, above
  • Area of non-uniform-width bar explicitly conveys information of a third quantity that is implicitly related to first and second quantities from horizontal and vertical axes
  • bin limits
  • count/length
  • color
  • An approximate representation of the distribution of numerical data. Divide the entire range of values into a series of intervals and then count how many values fall into each interval this is called binning. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but not required to be) of equal size.
  • For example, determining frequency of annual stock market percentage returns within particular ranges (bins) such as 0-10%, 11-20%, etc. The height of the bar represents the number of observations (years) with a return % in the range represented by the respective bin.
  • x position
  • y position
  • symbol/glyph
  • color
  • size
  • Uses Cartesian coordinates to display values for typically two variables for a set of data.
  • Points can be coded via color, shape and/or size to display additional variables.
  • Each point on the plot has an associated x and y term that determines its location on the cartesian plane.
  • Scatter plots are often used to highlight the correlation between variables (x and y).
  • position x
  • position y
  • position z
  • color
  • symbol
  • size
  • Similar to the 2-dimensional scatter plot above, the 3-dimensional scatter plot visualizes the relationship between typically 3 variables from a set of data.
  • Again point can be coded via color, shape and/or size to display additional variables
  • nodes size
  • nodes color
  • ties thickness
  • ties color
  • Finding clusters in the network (e.g. grouping Facebook friends into different clusters).
  • Discovering bridges (information brokers or boundary spanners) between clusters in the network
  • Determining the most influential nodes in the network (e.g. A company wants to target a small group of people on Twitter for a marketing campaign).
  • Finding outlier actors who do not fit into any cluster or are in the periphery of a network.
  • color
  • Represents one categorical variable which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents.
  • For example, as shown in the graph to the right, the proportion of English native speakers worldwide
  • x position
  • y position
  • symbol/glyph
  • color
  • size
  • Represents information as a series of data points called 'markers' connected by straight line segments.
  • Similar to a scatter plot except that the measurement points are ordered (typically by their x-axis value) and joined with straight line segments.
  • Often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically.
  • width
  • color
  • time (flow)
  • A type of stacked area graph which is displaced around a central axis, resulting in a flowing shape.
  • Unlike a traditional stacked area graph in which the layers are stacked on top of an axis, in a streamgraph the layers are positioned to minimize their "wiggle".
  • Streamgraphs display data with only positive values, and are not able to represent both negative and positive values.
  • For example, the right visual shows the music listened to by a user over the start of the year 2012
  • size
  • color
  • Is a method for displaying hierarchical data using nested figures, usually rectangles.
  • For example disk space by location / file type
  • color
  • time (flow)
  • Type of bar chart that illustrates a project schedule
  • Modern Gantt charts also show the dependency relationships between activities and current schedule status.
  • For example used in project planning
  • color
  • categorical variable
  • Represents the magnitude of a phenomenon as color in two dimensions.
  • There are two categories of heat maps:
    • cluster heat map: where magnitudes are laid out into a matrix of fixed cell size whose rows and columns are categorical data. For example, the graph to the right.
    • spatial heat map: where no matrix of fixed cell size for example a heat-map. For example, a heat map showing population densities displayed on a geographical map
    • x position
    • color
    • Uses a series of colored stripes chronologically ordered to visually portray long-term temperature trends.
    • Portrays a single variable—prototypically temperature over time to portray global warming
    • Deliberately minimalist—with no technical indicia—to communicate intuitively with non-scientists [33]
    • Can be "stacked" to represent plural series (example)
    • radial distance (dependent variable)
    • rotating angle (cycling through months)
    • color (passing years)
    • Portrays a single dependent variable—prototypically temperature over time to portray global warming
    • Dependent variable is progressively plotted along a continuous "spiral" determined as a function of (a) constantly rotating angle (twelve months per revolution) and (b) evolving color (color changes over passing years) [34]
    • x axis
    • y axis
    • A method for graphically depicting groups of numerical data through their quartiles.
    • Box plots may also have lines extending from the boxes (whiskers) indicating variability outside the upper and lower quartiles. may be plotted as individual points.
    • The two boxes graphed on top of each other represent the middle 50% of the data,, with the line separating the two boxes identifying the median data value and the top and bottom edges of the boxes represent the 75th and 25th percentile data points respectively.
    • Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution, thus are useful for getting an initial understanding of a data set. For example, comparing the distribution of ages between a group of people (e.g. male and females).
      or process
    • Represents a workflow, process or a step-by-step approach to solving a task.
    • The flowchart shows the steps as boxes of various kinds, and their order by connecting the boxes with arrows.
    • For example, outlying the actions to undertake if a lamp is not working, as shown in the diagram to the right.
    • attributes
    • value assigned to attributes
    • Displays multivariatedata in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point.
    • The relative position and angle of the axes is typically uninformative, but various heuristics, such as algorithms that plot data as the maximal total area, can be applied to sort the variables (axes) into relative positions that reveal distinct correlations, trade-offs, and a multitude of other comparative measures.
    • For example, comparing attributes/skills (e.g. communication, analytical, IT skills) learnt across different a university degrees (e.g. mathematics, economics, psychology)
    • all possible logical relations between a finite collection of different sets.
    • Shows all possible logical relations between a finite collection of different sets.
    • These diagrams depict elements as points in the plane, and sets as regions inside closed curves.
    • A Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set.
    • The points inside a curve labelled S represent elements of the set S, while points outside the boundary represent elements not in the set S. This lends itself to intuitive visualizations for example, the set of all elements that are members of both sets S and T, denoted ST and read "the intersection of S and T", is represented visually by the area of overlap of the regions S and T. In Venn diagrams, the curves are overlapped in every possible way, showing all possible relations between the sets.

    Interactive data visualization enables direct actions on a graphical plot to change elements and link between multiple plots. [35]

    Interactive data visualization has been a pursuit of statisticians since the late 1960s. Examples of the developments can be found on the American Statistical Association video lending library. [36]

    Common interactions include:

    • Brushing: works by using the mouse to control a paintbrush, directly changing the color or glyph of elements of a plot. The paintbrush is sometimes a pointer and sometimes works by drawing an outline of sorts around points the outline is sometimes irregularly shaped, like a lasso. Brushing is most commonly used when multiple plots are visible and some linking mechanism exists between the plots. There are several different conceptual models for brushing and a number of common linking mechanisms. Brushing scatterplots can be a transient operation in which points in the active plot only retain their new characteristics. At the same time, they are enclosed or intersected by the brush, or it can be a persistent operation, so that points retain their new appearance after the brush has been moved away. Transient brushing is usually chosen for linked brushing, as we have just described.
    • Painting: Persistent brushing is useful when we want to group the points into clusters and then proceed to use other operations, such as the tour, to compare the groups. It is becoming common terminology to call the persistent operation painting,
    • Identification: which could also be called labeling or label brushing, is another plot manipulation that can be linked. Bringing the cursor near a point or edge in a scatterplot, or a bar in a barchart, causes a label to appear that identifies the plot element. It is widely available in many interactive graphics, and is sometimes called mouseover.
    • Scaling: maps the data onto the window, and changes in the area of the. mapping function help us learn different things from the same plot. Scaling is commonly used to zoom in on crowded regions of a scatterplot, and it can also be used to change the aspect ratio of a plot, to reveal different features of the data.
    • Linking: connects elements selected in one plot with elements in another plot. The simplest kind of linking, one-to-one, where both plots show different projections of the same data, and a point in one plot corresponds to exactly one point in the other. When using area plots, brushing any part of an area has the same effect as brushing it all and is equivalent to selecting all cases in the corresponding category. Even when some plot elements represent more than one case, the underlying linking rule still links one case in one plot to the same case in other plots. Linking can also be by categorical variable, such as by a subject id, so that all data values corresponding to that subject are highlighted, in all the visible plots.

    There are different approaches on the scope of data visualization. One common focus is on information presentation, such as Friedman (2008). Friendly (2008) presumes two main parts of data visualization: statistical graphics, and thematic cartography. [37] In this line the "Data Visualization: Modern Approaches" (2007) article gives an overview of seven subjects of data visualization: [38]

      & resources
    • Displaying connections
    • Displaying data
    • Displaying news
    • Displaying websites
    • Tools and services

    All these subjects are closely related to graphic design and information representation.

    On the other hand, from a computer science perspective, Frits H. Post in 2002 categorized the field into sub-fields: [9] [39]

    Within The Harvard Business Review, Scott Berinato developed a framework to approach data visualisation. [40] To start thinking visually, users must consider two questions 1) What you have and 2) what you’re doing. The first step is identifying what data you want visualised. It is data-driven like profit over the past ten years or a conceptual idea like how a specific organisation is structured. Once this question is answered one can then focus on whether they are trying to communicate information (declarative visualisation) or trying to figure something out (exploratory visualisation). Scott Berinato combines these questions to give four types of visual communication that each have their own goals. [40]

    These four types of visual communication are as follows

    • idea illustration (conceptual & declarative). [40]
      • Used to teach, explain and/or simply concepts. For example, organisation charts and decision trees.
      • Used to discover, innovate and solve problems. For example, a whiteboard after a brainstorming session.
      • Used to spot trends and make sense of data. This type of visual is more common with large and complex data where the dataset is somewhat unknown and the task is open-ended.
      • The most common and simple type of visualisation used for affirming and setting context. For example, a line graph of GDP over time.

      Data presentation architecture (DPA) is a skill-set that seeks to identify, locate, manipulate, format and present data in such a way as to optimally communicate meaning and proper knowledge.

      Historically, the term data presentation architecture is attributed to Kelly Lautt: [a] "Data Presentation Architecture (DPA) is a rarely applied skill set critical for the success and value of Business Intelligence. Data presentation architecture weds the science of numbers, data and statistics in discovering valuable information from data and making it usable, relevant and actionable with the arts of data visualization, communications, organizational psychology and change management in order to provide business intelligence solutions with the data scope, delivery timing, format and visualizations that will most effectively support and drive operational, tactical and strategic behaviour toward understood business (or organizational) goals. DPA is neither an IT nor a business skill set but exists as a separate field of expertise. Often confused with data visualization, data presentation architecture is a much broader skill set that includes determining what data on what schedule and in what exact format is to be presented, not just the best way to present data that has already been chosen. Data visualization skills are one element of DPA."

      Objectives Edit

      DPA has two main objectives:

      • To use data to provide knowledge in the most efficient manner possible (minimize noise, complexity, and unnecessary data or detail given each audience's needs and roles)
      • To use data to provide knowledge in the most effective manner possible (provide relevant, timely and complete data to each audience member in a clear and understandable manner that conveys important meaning, is actionable and can affect understanding, behavior and decisions)

      Scope Edit

      With the above objectives in mind, the actual work of data presentation architecture consists of:

      Traceback (most recent call last): File "C:UsersmeAppDataRoamingBlender", line 162, in execute if v.lstrip("-+").isdigit(): AttributeError: 'IDPropertyGroup' object has no attribute 'lstrip'

      Basemaps - NaN cast error

      The error that goes with this

      Originally posted by @MikeDabrowski in

      Adding Mac OSX & Linux support in documentation

      I saw your great Wiki and the installation for gdal python bindings inside blender ont this page : It's very helpfull but there is windows only installation. I don't use Windows. According to you, I would want to add other plateform to your wiki.

      Mac Osx Tested on Yosemite 10.10 and Blender 2.74 1) Install Xcode and Macports from this link :

      2) Install gdal and gdal python bindings Open a terminal from spotlight or from Applications => Utilities => Terminal Then type with administratives rights :

      sudo port install gdal py34-gdal

      3) Copy osgeo folder from python bindings to blender

      cp -rf /opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/osgeo /where_you_put_blender_on_your_mac/Blender/

      Replace where_you_put_blender_on_your_mac with the path where you run or install Blender

      Test it in Blender Python console like windows installation.

      I think there is a mistake in the wiki with this : Finally, to get GDAL working in Blender, just copy osgeo folder in Python tree folder of Blender (C:Program FilesBlender FoundationBlender2.70pythonlibsite-packages). If I put the osgeo folder in the same path like you recommand (python/lib/site-packages), I'm not able to launch gdal from blender. When I put osgeo in blender's module folder, It works !

      Sorry for my poor english, I'm french .

      No imageIO module

      No imaging library available. ImageIO module was not correctly installed. Please reinstall it or try to install Python GDAL or Pillow module

      this is my problem,when i start BlenderGIS reinstall it 3 times thank you

      Gaps between DEM's when trying to achieve tiled terrain project

      I'm working to take assets imported using BlenderGIS and then work on them with Armory, so I can interact with the terrain and fly through it. Due to the size of some of the rasters 20000+ pixels I am hitting WebGL limitations within Armory exports.

      To resolve this, I attempted to cut my Heightmap up in QGIS and load individual tiles with BlenderGIS, but I got the following gaps between DEM's which were impossible to join:

      I spotted this had been mentioned before in following posts:

      So I switched between pxLoc='CENTER' and Loc='CORNER' in operators/ but neither made a difference.

      So I took your advice in one of the posts and just imported the whole heightmap and looked for another route to tile. Having found this script I was able to slice up the mesh into 16 tiles (seperate objects):

      And started importing sat images that I had already split into tiles within QGIS, this appeared to look nice and worked well:

      However when I zoomed into the edges I had a similar gap issue:

      I feel like I'm getting closer but would appreciate a little help trying to reduce the gap issues.

      The entire sat image in this test is 10000 x 10000 and each tile is 2500 x 2500.

      I'm using the following python extract the square HM from the source asc:

      And the following to generate tiled sat images:

      I'm running Blender 2.8 with your latest BlenderGIS build. Projection on the project is QGS 84 / UTM zone58S

      .blend file for reference:!AjCedBZJ5Eh4i3-eifFqf19IZefa

      A couple of the Sat tiles:!AjCedBZJ5Eh4jADXJQmz8O3lyLja

      Entire square .asc heightmap:!AjCedBZJ5Eh4jADXJQmz8O3lyLja

      Place the Georef Cam higher

      Sometimes I get black holes when I render the image:

      This happens when there is a single peak that is higher than surrounding terrain. Is it possible to have the camera higher by default, so it's above all parts of the DEM?

      Get SRTM TimeoutError: [WinError 10060]

      I got this error loading the SRTM file, i try another locations but is the same error.

      TimeoutError: [WinError 10060]An error occurred during the connection attempt since the connected party did not respond properly after a period of time, or an error occurred in the established connection since the connected host could not respond.

      Watch the video: Data Science with OpenStreetMap and Python Maptime Salzburg 2018 (October 2021).