Abstract
This paper explains the basic concepts of social big data and its integrated analysis. First, we will explain the outline and examples of the real-world data, open data, and social data that compose social big data. After we will describe interactions among the real-world data, open data, and social data, we will introduce basic concepts of an integrated analysis based on “Ishikawa concept.” Furthermore, after explaining the flow of integrated analysis in line with the basic concept, a data model approach for integrated analysis will be introduced. Based on that, integrated hypotheses and integrated analysis will be specifically explained in another paper “Social Big Data: Case Studies” in this issue through several use cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdulle, A., Wanner, G.: 200 years of least squares method. Elem. Math. 57(2), 45–60 (2002). https://doi.org/10.1007/PL00000559
AmeBlo. https://ameblo.jp/. Accessed 2019
Apache Spark. https://spark.apache.org/. Accessed 2019
BIODIC. Grid Square System. http://www.biodic.go.jp/english/kiso/col_mesh.html. Accessed 2019
Blogger. https://www.blogger.com/. Accessed 2019
Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. STS. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29854-2_9
DBpedia. https://wiki.dbpedia.org/. Accessed 2019
Deezer. https://www.deezer.com/. Accessed 2019
Delicious. https://del.icio.us/. Accessed 2019
Digg. http://digg.com/. Accessed 2019
Endo, M., Shoji, Y., Hirota, M., Ohno, S., Ishikawa, H.: Best-time estimation for regions and tourist spots using phenological observations with geotagged tweets. Int. J. Inf. Soc. (IJIS) 9(3), 109–117 (2017)
Endo, M., Hirota, M., Ohno, S., Ishikawa, H.: Best-time estimation method using information interpolation for sightseeing spots. Int. J. Inf. Soc. (IJIS) 10(2), 97–105 (2018)
Facebook. https://www.facebook.com/. Accessed 2019
FarmVille. https://www.zynga.com/games/farmville. Accessed 2019
Flickr. https://www.flickr.com/. Accessed 2019
G suite. https://gsuite.google.com/. Accessed 2019
Google Cloud Vision API. https://cloud.google.com/vision/?hl=en. Accessed 2019
Google Photo. https://photos.google.com/. Accessed 2019
Hara, S., Yamamoto, Y., Araki, T., Hirota, M., Ishikawa, H.: Discrimination of crater with central hill by machine learning using Kaguya DEM. J. Space Sci. Inf. 8, 1–10 (2019). (in Japanese)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference, and Prediction, 2 edn. Springer, Cham (2009)
Holland, P.W.: Statistics and causal inference. J. Am. Stat. Assoc. 81(396), 945–960 (1986)
IEEE, Towards a Definition of Internet of Things (IoT). https://iot.ieee.org/images/files/pdf/
IEEE_IoT_Towards_Definition_Internet_of_Things_Revision1_27MAY15.pdf. Accessed 2019
Instagram. https://www.instagram.com/. Accessed 2019
Ishikawa, H.: Social Big Data Mining. CRC Press, Boca Raton (2015)
Ishikawa, H., Endo, M., Sugiyama, I., Hirota, M., Yokoyama, S.: Is it possible for the first three-month time-series data of views and downloads to predict the first year highly-cited academic papers in open access journals? Int. J. Inf. Soc. (IJIS) 8(2), 59–66 (2016). ISSN 1883-4566
Ishikawa, H., Kato, D., Masaki, E., Hirota, M.: Generalized difference methods for generating integrated hypotheses in social big data (invited paper). In: Proceedings of the 10th International Conference on Management of Digital EcoSystems (MEDES 2018) (2012)
Ishikawa et al.: Social Big Data Practiced in Full Stack JavaScript and Python Machine Learning Library-From Basic Concepts and Techniques to Collection, Analysis and Visualization. Corona Ltd. (2019). (in Japanese)
Ishikawa, H., Yamamoto, Y., Hirota, M., Endo, M.: Towards construction of an explanation framework for whole processes of data analysis applications: concepts and use cases. In: Proceedings of the Eleventh International Conference on Advances in Multimedia, MMEDIA 2019 (Special tracks: SBDMM: Social Big Data in Multimedia) (2019)
JAXA DARTS. https://www.darts.isas.jaxa.jp/index.html.en. Accessed 2019
Jimenez, F.: Intelligent Vehicles: Enabling Technologies and Future Developments. Butterworth-Heinemann, Oxford (2017)
Line. https://line.me/. Accessed 2019
Mechanical Turk. https://aws.amazon.com/jp/mturk/. Accessed 2019
MEDES. Welcome to the International Conference on ManagEment of Digital EcoSystems (MEDES).http://medes.sigappfr.org/. Accessed 2019
Mickens, R.E.: Difference Equations. CRC Press, Boca Raton (2018)
Micro Workers. https://Ttv.Microworkers.Com/index/template. Accessed 2019
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. arXiv:1301.3781 .Accessed 2019
Minecraft. https://minecraft.net/. Accessed 2019
Ministry of Internal Affairs and Communications 2019. Open data. http://www.soumu.go.jp/johotsusintokei/whitepaper/ja/h25/html/nc121210.html. Accessed 2019
Ministry of Land, Infrastructure, Transport and Tourism Meteorological Agency, Information on phenological observation (2019). https://www.data.jma.go.jp/sakura/data/index.html. Accessed 2019
Mitomi, K., Endo, M., Hirota, M., Yokoyama, S., Shoji, Y., Ishikawa, H.: How to find accessible free wi-fi at tourist spots in Japan. In: Spiro, E., Ahn, Y.-Y. (eds.) SocInfo 2016. LNCS, vol. 10046, pp. 389–403. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47880-7_24
Nobumoto, K., Kato, D., Endo, M., Endo, M., Hirota, M., Ishikawa, H.: Multilingualization of restaurant menu by analogical description. Proceedings of the 9th Workshop on Multimedia for Cooking and Eating Activities (CEA 2017) (2017)
Nobumoto, K., Hirota, M., Kato, D., Ishikawa, H.: Multilingualization of cooking by analogical description. J. Jpn. Soc. Fuzzy Theory Intell. Inf. 31(1), 526–533 (2019). (in Japanese)
ODbL. https://opendatacommons.org/licenses/odbl/index.html. Accessed 2019
Office. https://www.officeppe.com/. Accessed 2019
Olshannikova, E., Olsson, T., Huhtamaki, J., Karkkainen, H.: Conceptualizing big social data. J. Big Data 4(1), 1–19 (2017). https://doi.org/10.1186/s40537-017-0063-x. Accessed 2019
OpenStreetMap. https://www.openstreetmap.org/. Accessed 2019
Peng, D.: Reproducible research in computational science. Science 334(2), 1226–1227 (2011)
Photobucket. https://photobucket.com/. Accessed 2019
Pinterest. http://www.pinterest.com/. Accessed 2019
RingCentral. https://www.ringcentral.com/. Accessed 2019
Salton, G., et al.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Skype. https://www.skype.com/ .Accessed 2019
Slashdot. https://slashdot.org/. Accessed 2019
Smith, D.D., Eggen, M., Andre, R.S.: A Transition to Advanced Mathematics. Brooks/Cole Pub Co., Lexington (2014)
Spotify. https://www.spotify.com/. Accessed 2019
Strang, G.: Introduction to Linear Algebra. Wellesley-Cambridge Press, Cambridge (2016)
Strang, G.: Linear Algebra and Learning from Data. Wellesley-Cambridge Press, Cambridge (2019)
TripAdvisor. https://www.tripadvisor.com/. Accessed 2019
Tsuchida, T., et al.: Semantic operation for area and landmarks using Word2Vec. In: Proceedings of DEIM Forum 2016. (in Japanese)
Twitter. https://twitter.com/. Accessed 2019
W3C, Linked Data. https://www.w3.org/wiki/LinkedData. Accessed 2019
W3C, SPARQL Query Language for RDF. https://www.w3.org/TR/rdf-sparql-query/. Accessed 2019
WeChat. https://www.wechat.com/. Accessed 2019
Weibo. http://jp.weibo.com/. Accessed 2019
WhatsApp. https://www.whatsapp.com/. Accessed 2019
Wikipedia. https://www.wikipedia.org/. Accessed 2019
Wikipedia. ChaCha. https://en.wikipedia.org/wiki/ChaCha_(search_engine). Accessed 2019
Wikipedia. DBpedia (2019). http://wiki.dbpedia.org/. Accessed 2019
Wikipedia. Flickr (2019). https://en.wikipedia.org/wiki/flickr Accessed 2019
Wikipedia. Instagram (2019). https://en.wikipedia.org/wiki/instagram. Accessed 2019
Wikipedia. Mahalo (2019). https://en.wikipedia.org/wiki/mahalo.com. Accessed 2019
Wikipedia. Tim Barners-Lee. https://en.wikipedia.org/wiki/tim_berners-lee. Accessed 2019
Wikipedia. Twitter (2019). https://en.wikipedia.org/wiki/twitter. Accessed 2019
WordPress. https://wordpress.com/
Wu, G., Shen, D., Sabuncu, M.R.: Machine Learning and Medical Imaging. Academic Press, New York (2016)
Yamamoto, Y.: The beginning of DARTS lunar and planetary science, Japan’s lunar and planetary exploration and science data archive, No.3, PLAIN News 190, 2009 (in Japanese)
http://www.isas.jaxa.jp/docs/PLAINnews/190_contents/190_2.html Accessed 2019
Yamamoto, Y., Ishikawa, H.: Anomaly detection with hotelling t-square method for raw housekeeping telemetry. In: International Symposium on Space Technology and Science (2019)
Yamamoto, Y., Ishikawa, H.: Data management in Japanese planetary explorations for big data era. In: Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics (WIMS2020) (2020)
Yelp (2019). https://www.yelp.com/. Accessed 2019
YouTube (2019). https://www.youtube.com/. Accessed 2019
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number 20K12081, Tokyo Metropolitan University Grant-in-Aid for Research on Priority Areas, and Nomura School of Advanced Management Research Grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix Social Big Data Model
Appendix Social Big Data Model
Our Social Big Data model (SBD hereafter) model uses a mathematical concept of a family, a collection of sets, as a basis for data structures. Family can be used as an apparatus for bridging the gaps between data management operations and data analysis operations.
Basically, our database is a Family. A Family is divided into Indexed family and Non-Indexed family. A Non-Indexed family is a collection of sets.
An Indexed family is defined as follows:
-
{Set} is a Non-Indexed family with Set as its element.
-
{Seti} is an Indexed family with Seti as its i-th element. Here i: Index is called indexing set and i is an element of Index.
-
Set is {<time space object>}.
-
Seti is {<time space object>}i. Here, object is an identifier to arbitrary identifiable user-provided data, e.g., record, object, and multimedia data appearing in social big data. Time and space are universal keys across multiple sources of social big data.
-
{Indexed familyi} is also an Indexed family with Indexed familyi as its i-th element. In other words, Indexed family can constitute a hierarchy of sets.
Please note that the following concepts are interchangeably used in this study.
-
Singleton family \( \Leftrightarrow \) set
-
Singleton set \( \Leftrightarrow \) element
If operations constructing a family out of a collection of sets and those deconstructing a family into a collection of sets are provided in addition to both family-dedicated and set-dedicated operations, SBD applications will be described in an integrated fashion by our proposed model.
SBD is consisted of Family data management operations and Family data mining operations. Further, Family data management operations are divided into Intra Family operations and Inter Family operations.
-
1)
Intra Family Data Management Operations
-
a)
Intra Indexed Intersect (i: Index Db p(i)) returns a singleton family (i.e., set) intersecting sets which satisfy the predicate p(i). Database Db is a Family, which will not be mentioned hereafter.
-
b)
Intra Indexed Union (i: Index Db p(i)) returns a singleton family unioning sets which satisfy p(i).
-
c)
Intra Indexed Difference (i: Index Db p(i)) returns a singleton family, that is, the first set satisfying p(i) minus all the rest of sets satisfying p(i).
-
d)
Indexed Select (i: Index Db p1(i) p2(i)) returns an Indexed family with respect to i (preserved) where the element sets satisfy the predicate p1(i) and the elements of the sets satisfy the predicate p2(i). As a special case of true as p1(i), this operation returns the whole indexed family. In a special case of a singleton family, Indexed Select is reduced to Select (a relational operation).
-
e)
Indexed Project (i: Index Db p(i) a(i)) returns an Indexed family where the element sets satisfy p(i) and the elements of the sets are projected according to a(i), attribute specification. This also extends also relational Project.
-
f)
Intra Indexed cross product (i: Index Db p(i)) returns a singleton family obtained by product-ing sets which satisfy p(i). This is extension of Cartesian product, one of relational operators.
-
g)
Intra Indexed Join (i: Index Db p1(i) p2(i)) returns a singleton family obtained by joining sets which satisfy p1(i) based on the join predicate p2(i). This is extension of join, one of relational operators.
-
h)
Select-Index (i:Index Db p(i)) returns i:Index of seti which satisfy p(i). As a special case of true as p(i), it returns all index.
-
i)
Make-indexed family (Index Non-Indexed Family) returns an indexed Family. This operator requires order-compatibility, that is, that i corresponds to i-th set of Non-Indexed Family.
-
j)
Partition (i: Index Db p(i)) returns an Indexed family. Partition makes an Indexed family out of a given set (i.e. singleton family either w/or w/o index) by grouping elements with respect to p (i: Index). This is extension of “groupby” as a relational operator.
-
k)
ApplyFunction (i: Index Db f(i)) applies f(i) to i-th set of DB, where f(i) takes a set as a whole and gives another set including a singleton set (i.e., Aggregate function). This returns an indexed family. f(i) can be defined by users.
-
a)
-
2)
Inter Family Data Management Operations Index-Compatible
-
a)
Indexed Intersect (i: Index Db1 Db2 p(i)) union-compatible
-
b)
Indexed Union (i: Index Db1 Db2 p(i)) union-compatible
-
c)
Indexed Difference (i: Index Db1 Db2 p(i)) union-compatible
-
d)
Indexed Join (i: Index Db1 Db2 p1(i) p2(i))
-
e)
Indexed cross product (i: Index Db1 Db2 p(i))
-
a)
-
3)
Family Data Mining Operations
-
a)
Cluster (Family method similarity {par}) returns a Family as default, where Index is automatically produced. This is an unsupervised learner.
-
b)
Make-classifier (i: Index set:Family learnMethod {par}) returns a classifier (Classify) with its accuracy. This is a supervised learner.
-
c)
Classify (Index/class set) returns an indexed family with class as its index.
-
d)
Make-frequent itemset (Db supportMin) returns an Indexed Family as frequent itemsets, which satisfy supportMin.
-
e)
Make-association-rule (Db confidenceMin) creates association rules based on frequent itemsets Db, which satisfy confidenceMin. This is out of range of our algebra, too.
-
a)
Please note that the predicates and functions used in the above operations can be defined by the users in addition to the system-defined ones such as Count.
Rights and permissions
Copyright information
© 2021 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Ishikawa, H., Yamamoto, Y. (2021). Social Big Data: Concepts and Theory. In: Hameurlain, A., Tjoa, A.M., Chbeir, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVII. Lecture Notes in Computer Science(), vol 12630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62919-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-662-62919-2_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-62918-5
Online ISBN: 978-3-662-62919-2
eBook Packages: Computer ScienceComputer Science (R0)